-
Notifications
You must be signed in to change notification settings - Fork 269
docs: Add Managed Kafka Connect terraform sample for Clusters #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for Clusters * Add sample code for creating a Kafka Connect cluster
Here is the summary of changes. You are about to add 3 region tags.
This comment is generated by snippet-bot.
|
Corrected tags.
gcp_config { | ||
access_config { | ||
network_configs { | ||
subnet = "projects/${data.google_project.default.number}/regions/us-central1/subnetworks/default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this value be linked to a subnet resource? the hardcoding of the region and subnet name is suboptimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I acknowledge this is suboptimal, I'm hardcoding the subnet resource due to a current limitation with deleting network attachments associated with Managed Kafka Connect clusters. I had previously attempted creating a Subnet resource as part of the code example, but it was causing test failures. (For more information, please see here)
However, hardcoding the subnet resource also seems to be no longer working (although it initially passed the test cases). This could be due to the changing test setup across runs. Do you have any suggestions on how we may proceed to pass the tests given this limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've committed a workaround using a provisioner for deleting leaked resources: 612a5ac
Seems to be working with test cases. Marking as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(moved my review comment to this thread)
We do not support local-exec in samples. We do not want to encourage our users to use this functionality.
If there's an upstream issue, it needs to be fixed at the resource level.
If the sample can't be tested (either now, or in general), then we can disable the terraform apply/destroy part of tests.
The google_managed_kafka_cluster takes 30 minutes to create, the google_managed_kafka_connect_cluster takes 20 minutes, unsure if this is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Given that there is a known issue which prevents resource deletion, I believe the best path forward for now would be to disable the terraform apply/destroy part of the tests as you have suggested. Once the issue is resolved, we can update the examples to enable terraform apply/destroy.
Can you please guide how to disable the terraform apply/destroy part of tests?
(Also, it is indeed expected that the google_managed_kafka_cluster and google_managed_kafka_connect_cluster take 30 & 20 minutes to create, respectively. For reference, see here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give you some context, there's an issue with GKE networking that leaks Network Attachment resources in a Subnet even after a ConnectCluster is deleted. This means that if we add a Subnet resource to the same Terraform config, upon TF delete action the Subnet resource's deletion will hang, as there's still a dangling Network Attachment left. The leak eventually goes away, but its indefinite - perhaps hours, perhaps days. The leak issue is pretty complex to fix and doesn't have staffing yet - so it may not land for months/quarters.
I think the best way forward is to include the subnet resources hardcoded as strings in the Terraform samples. It allows us to exercise the apply/delete logic for the ConnectCluster, which is what we care about here.
For most customers, their network resources will be managed in a separate file, often by separate teams altogether. And the lifetime of Network/Subnet resources is often longer than those of using resources like ConnectClusters. So our customers may still be ok, even with this unfortunate leak issue preventing them from putting network + ConnectCluster resources in the same file.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally: Our samples should reflect best practices in real world applications. We have some samples where the configuration required to make the sample run cleanly is only useful in a CI testing instance, and not code we would suggest a user configure. In these cases, we can disable the terraform apply/destroy
(among other steps), but I believe this is the best solution here.
You can disable this by adding a test.yaml file with skip: true
. example. (The metadata.name is roughly the concatenation of the folder hierarchy of the sample)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the best-practice sample is not going to run without modification (ie: applying the Network resources separate from the ConnectCluster resource), is it still okay to recommend that? Should we add a comment with the limitation? We don't want to mislead users by showing them a sample that won't work out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that staggered application is required, you can describe that in the devsite page. You can also use additional [# START
and [# END
tags to separate out these resources so you can describe them in context. (example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(apologies for the single comments instead of review comments, but please address the code review comments)
Description
Fixes https://b.corp.google.com/issues/430087669
Note: If you are not associated with Google, open an issue for discussion before submitting a pull request.
Checklist
Readiness
Style
guide
Testing
I have performed tests described in the Contributing guide:
terraform apply
terraform fmt
checkIntended location
Location(s): https://cloud.google.com/managed-service-for-apache-kafka/docs/connect-cluster/create-connect-cluster#create-connect-cluster
Reason:
API enablement
Review