-
Notifications
You must be signed in to change notification settings - Fork 16
Mount Kubernetes volumes using 9pfs #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for urunc canceled.
|
Signed-off-by: Charalampos Mainas <charalampos.mainas@gmail.com>
Implement a method of mounting local paths inside unikernels (Unikraft only at this moment) based on a dedicated annotation. This can be used in conjunction with NFS shared volumes in a Kubernetes cluster, when a NFS provider has been configured. The new annotation is called "com.urunc.unikernel.ninePFSMntPoint" and it holds the value of mount point inside the unikernel, or in other words, the value of "mountPath" specified by the volumeMount field of the Kubernetes template. The recommended usage is by setting it in the Pod specification of the Kubernetes template. In Unikontainer Exec method, urunc will search the list of mount points passed from containerd and find a match for the mount point specified by the annotation. Once found, urunc will be able to determine the local path on the worker node of the PersistenVolume or PersistentVolumeClaim and instruct qemu to mount it via 9pfs provider into the unikernel. Signed-off-by: Ciprian Barbu <ciprian.barbu@thalesgroup.com>
f8acde4
to
12331ed
Compare
ok-to-test |
Hi @ciprian-barbu! thanks for taking the time to craft this! @cmainas will follow up on the rationale of using the shared fs and we can take it from there! Regardless, we are in the process of moving the repo to its own org, and we will need to refactor (remove/update) some parts of the CI. For instance a step is not succeeding due to secrets and github security limitations in accessing those secrets. Bear with us and we will provide steps to rebase your PR on top of a "fixed" CI workflow. In the meantime please reach out to us if you need any additional info. thanks again for your contribution, looking forward to getting it merged! |
Hello @ciprian-barbu , thank you for opening this PR. Indeed mounting volumes to the unikernel is necessary for a lot of use cases. We were planning to add support for shared-fs in the upcoming release, so your PR comes in a very good timing too. The approach in the PR looks good and except of some minor comments, merging it would not be a problem. It would be nice though if we could talk over 2 points: a) shared-fs security and b) scalability of the proposed approach. Security: Would it be ok to wait for adding support for chroot first, before we merge this PR? Unfortunately, both changes touch a few common lines and rebase might require some effort, but I will be happy to help with it. Scalability:
Please let me know if I did not understand correctly tour approach. Also, what do you think about the above two approaches? Thank you again for the effort and your contribution. P.S.: I think the runners should be ok now with the repo transfer. Therefore, rerunning the tests should work (hopefully). Edit: For re-running the tests, you will also need to rebase the PR over the |
Hello, See my responses below
Sure, it is fine with me, you are the maintainers, and our timeline in the project is not that tight. I haven't thought about the security aspects, so anything you propose is fine with me.
Indeed, I initially thought about allowing multiple volume mounts, but I got stuck at some point and decided to create a PR to ask for comments. I will look into this option, maybe the new annotation can point to a list of strings, instead of just one.
I think it's ok, do you have some work in progress I can look at?
The idea was to send the change for comments, which you provided plentiful, so thank you!
Best regards, |
Hello,
That is correct, but typically, volumes are passed as bind mount entries in the container configuration. Therefore, I would vote to base on your PR and instead of using the annotation to define the mount point, check the mount type of each entry and if it is a bind mount store it in
I think we can keep working in this PR. Even a single mount point is a good starting point. If you want, you can further extend it to multiple mount points. It would also be nice to provide an example in the documentation. A small tutorial with an example deployment would be enough. I will ping here as soon as chroot support is merged. Kind regards, |
Hello @ciprian-barbu , we finally have support for the creation and changing rootfs before the execution of the monitor process (see #187). I have also created the mount_vol branch, which performs the necessary mounts for the bind mounts in the container's configuration. The branch needs much more work, but it seems to work. So, it would be nice if you rebase your branch over Edit: You should not need the first commit with the quick fix for network namespace ( Edit 2: The code looks fine to me. I would vote though for a change of the name in annotation (e.g. sharedVolumeMntPoint (?)). Except of 9pfs there are other ways to have a shared directory between the host and the guest, hence we can use the same annotation for such cases (e.g. virtio-fs). |
Hello, Thank you for the update. I had a quick look at the monitor process feature, but I didn't have time to dive too deep into it. I'm still a bit puzzled whether the volumes specified through Kubernetes should appear in the unikernel runtime, so I need to test it out a bit and get a good understanding of what is actually happening. I will get back with details when I'm ready. BR, |
Hi, I noticed that qemu is now called from a hardcoded path of /usr/share/qemu. Furthermore, if I try to build and install urunc manually, and then run a Pod, I will get error about it: I noticed there is a deployment manifest which would automatically install these dependencies, but is it production ready? Am I supposed to install urunc that way? BR, |
Hi @ciprian-barbu! indeed, we are in the process of refactoring the way we use the underlying hypervisors. Towards this effort, we wouldn't like to assume where the user has placed the qemu binary artifacts. This is the reason we have added a new function that looks for the artifacts in Regarding let us know how it goes! thanks again for sharing your findings and taking the time to play with urunc! |
Hello, I spent most of today to test the latest urunc and the code in mount_vol. But eventually I pulled the docker image for urunc-deploy, and copied the resources from there and onto my worker node. I will have to test with the actual deployment manifest, because it is interesting for me too, it will spare me the work to deploy urunc in a controlled manner. Looking at mount_vol, I got into issues because urunc was trying to mount regular files (e.g. etc-hosts and resolv.conf) using the unix.Mount functionality in go. It looks like urunc might need to consider what to do with these Kubernetes maintained resources, which is not ideal. Perhaps the mounts of type "bind" which are not directories could be simply ignored for now. I need to test a bit more what happens with Kubernetes volumes mounted this way, for instance I test with the nfs-provider which creates PersistentVolumeClaims, which get mounted in something like this (excerpt from config.json)
I was expecting the pvc to get mounted in the rootfs location, but for some reason only the 'mkdir' part seems to work, and the mount seems to fail. So I need to check what is wrong and come back with some more details. BR, |
ouch, that's a bit tricky -- bear with us for a little while, we plan to distribute supported hypervisors as statically built binaries alongside urunc so you could have a single tar you could download and unpack on a node.
that's the rationale of urunc-deploy, taken from an EKS-specific use-case ;)
I'm not sure I understand -- IIRC it is possible to bind mount these files. The issue you mention about k8s is what we have been looking into for the kata-containers case as well. In principle, we can copy the files in the pivoted rootfs. However, if these files change, we would have to re-trigger the copy, which is something that is not feasible with urunc (so we will lose the k8s-specific functionality of secret rollout etc.). We will have to think about how to handle these.
sounds like a bug -- NFS-mounted dirs can be bind mount and we should be able to handle it properly. Perhaps it has something to do with the flags as we're being quite restrictive on that. |
Regardless of whether it works or not, the current implementation in mount_vol will first try to os.MkdirAll(dstPath, 0755), as if the source is always a directory. So this needs to be fixed.
I have also tested with a local PersistentVolumeClaim, and I still don't get the mount showing up. In fact, my unikernel lists all the files and directories on the rootfs, and for some reason not even the mount destination dir doesn't show up. I'm still investigating, not sure what is wrong. |
aha! now I got it :D We'll fix it ASAP, I'll let you know so you can rebase over the branch.
We'll need a bit of time to investigate this as we're currently lacking CPU cycles. You'll probably find it first, but in any case if we get a chance to test I'll let you know as well. thanks again for taking the time to look into |
Sure, take your time, I'm experimenting with my own fork for now. For the other problem, looking at the qemu command I don't see an obvious parameter which specifies the pivoted rootfs passed as an argument, which is why I think the unikernel sees a different view of the rootfs than what is prepared by urunc:
|
Can you check the latest with the following yaml: apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-fc-deployment
spec:
selector:
matchLabels:
app: nginx-fc
replicas: 1
template:
metadata:
labels:
app: nginx-fc
spec:
runtimeClassName: urunc
containers:
- name: nginx-fc
image: harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest
ports:
- containerPort: 80 the pod gets spawned correctly:
and the resulting pivoted rootfs is the following:
The same should stand for QEMU. AFAIU there's no parameter needed for the pivoted rootfs to passed from QEMU, as the monitor is spawned from the pivoted rootfs. So everything should be relative to the original pivoted path (maybe I'm wrong here). |
Hi, I tested this new version and it works fine, I see that the hosts, hostname and resolv.conf files are copied in the pivoted rootfs.
I think you are wrong here, qemu is essentially a machine emulator, and you need some sort of emulated device where the rootfs exists. That is why there are backends like virtiofsd or 9pfs, to help pass an unstructured location on the host in the guest, presented as some sort of real (but emulated) device. With the unikernel that I used in my tests, which lists all files in the filesystem, I still couldn't see the pivoted rootfs contents in the output. So I modified the code a bit to mount "/" from the host (which is correct because urunc performs chroot) inside the unikernel in "/mnt". Mounting it in "/" in the guest creates other problems, as the kernel is not present in the pivoted rootfs. On the other hand, I was looking on the host on the pivoted rootfs location and the mount was not present, but this is because urunc pivots into a different namespace, so it's not supposed to show up in the global namespace. It looks like there is some more work needed, I will wait until you get some more time to work on it. BR, |
Here are my resources:
I can see /mnt/9pfs in the listing, but no files in it. However, the unikernel lists the following:
So you can see that the unikernel sees the correct contents in /mnt/mnt/9pfs, although on the host nothing is shown. Which is because, I think, the mounting is not present in the default namespace on the host, but in the new namespace. |
thanks for info @ciprian-barbu and the detailed explanation! Indeed, the mount_vol branch what it does is to bring the spec-related volumes into the rootfs of the container (where the monitor will then spawn the unikernel). For the unikernel to see these contents, we need to add parameters to the monitors to allow this (depending on the unikernel framework and/or the monitor). There is ongoing work for that, tracked here: #109 (and it is essentially the only issue pending for the v0.6.0 release :D). This is work that @ cmainas has been doing and it should be ready by the end of this month. You can check this commit for a quick&dirty solution: https://github.com/urunc-dev/urunc/tree/just_sharedfs but I suspect what you have already done in this branch is similar. AFAIU you would like the contents to be available in the unikernel right? So with the current hacks this is feasible right? or you would like the contents to be available in the pivoted rootfs as well? I'll have to get back to you about the namespace issue you mention because I will have to look into it. |
Indeed, this was the source of inspiration for all the work around this pull request. If I understand correctly, with the shared fs option there will be 2 choices to run the unikernel:
For now I will wait for the work in progress to be merged, and then we can discuss what should happen with this PR. BR, |
Hello @ciprian-barbu , we have now merged the work in the For shared-fs, initially we had the following plan. If the guest requires access to files (either in the container's rootfs or from volumes), then Of course there are use cases where storage is not required and in these scenarios the annotation will be absent and In this PR, you suggested another approach, which totally makes sense and seems quite useful. Therefore, both approaches (mounting whole container's rootfs and mounting specific volumes) can co-exist in |
Hi, Sorry for the delay, I was busy with other stuff. Ideally, if that WIP will result in the possibility to automatically mount the volumes through any method, I would be happy to drop this PR. If indeed there is no other way to mount volumes on request, then I will go forward with rebasing and fixing the pending TODOs. I'm not sure that having two ways of achieving the same result is beneficial, so that is why I want to understand where the mount_vol work and mountRootfs will end up. Thanks and regards, |
Hello @ciprian-barbu , the PR for mounting the container;s rootfs through 9pfs in the guest is this one: #194 The PR indeed mounts the volumes inside the container's rootfs (and consequently the guest can access them). However, I would still argue that the two approaches can co-exist. One argument would be performance. The more files we share through 9pfs the slower it gets. For instance, in a use-case where a service/app does not need access to the whole container rootfs, but only to a couple of files which are accessible at runtime (e.g. Kubernetes secrets) mounting every volume in the container's rootfs and then share it with the guest would not be that optimal. Furthermore, there might be scenarios where setting the guest's rootfs as a shared-fs might not be supported by the guest. I am not sure what will happen in Unikraft if the unikernel already contains a pre-existing rootfs (e.g. embedded initrd) and then we try to mount the rootfs through 9pfs. We have a similar problem with Rumprun (but with block devices), which does not allow us to mount anything at "/". In such cases, the approach you propose will be useful. |
Hello, I'm sorry for the delayed response, I was out of office for a while and only managed to get back to this last week. I started by experimenting with the latest main, and specifically the com.urunc.unikernel.mountRootfs annotation. But I'm having troubles with the Unikraft unikernels I'm using, which worked fine in the past using urunc from the branch just_sharedfs. Now, I get errors right after the BIOS boot menu:
From what I can tell, looking at the rootfs structure on the worker, the rootfs is not properly instantiated on the pivot chroot, and some necessary files are not present, e.g. I'm not very familiar with how the OCI image should look like, so I started looking at the instructions about using bunny. However, this implies that the unikernel was already built, so I don't have a complete step-by-step routine for testing urunc with a known image. I will try using one of the images on harbor.nbfc.io in the meantime, but it will be great if you can provide some details on how to properly package the unikernel. Thanks and regards, |
Hello @ciprian-barbu , of course. Let me first summarize the new steps that In all cases, In the case the image uses the
So, if the container image has set the The fastest way to test out this would be to use the
In order to build an image with With Containerfile:
With bunnyfile:
Make sure to replace:
Also, keep in mind that |
Hello, Forgive me, but I think there is some context missing from here .. I spent the last few days looking at the bunny documentation, talked to Unikraft guys, and I still don't think I have all the picture. I thought it would be easy to build a unikernel which I can use to test urunc with mountRootfs: "true", but this is really not the case. According to the Unikraft developers, their kraftkit can only generate cpio rootfs, which like you said before cannot work with 9pfs. So instead I'm trying to find some instructions on building and packaging a Unikraft unikernel (I'm not familiar with other framewors) which can work. How do you exactly go about that? Is it supposed to work something like this?
Am I missing something? I get the following error: Thanks and regards, |
Hello @ciprian-barbu , I think it is a bit simpler than your description. The only requirement is a Unikraft unikernel which supports 9pfs and can mount it as its rootfs. Let's try as an example a Unikraft Nginx unikernel that uses shared-fs for its rootfs.
As a rootfs, we choose the same image that is defined in the Dockerfile that
We build the above with docker:
and we use the new image as the rootfs in the bunnyfile. We cna also update the cmdline to
which returns 404, since I forgot to add Of course the above method uses the whole nginx image as rootfs, but you can significantly reduce it, as Unikraft does in the respective Dockerfile. Hence, you could use the Dockerfile of the respective example to create a container image rootfs which can be later declared in Also, playing around with Unikraft's configuration will eliminate the vfscore error shown above. |
Thank you @cmainas very much for the details, this is exactly what I was looking for, very specific instructions for unikraft. I will try them and get back to you as soon as I can. I was actually struggling to understand how Unikraft unikernels packages the images and how urunc expects them to look like, so this information will be very useful, perhaps they can go into the bunny documentation, for specific step-by-step howto with Unikraft. I also want to mention that I was tried out the image you mentioned before I'm still looking at the code and scracthing my head, but am I missing something? Best regards, |
Here are the details I used, I used this in my k3s envirionment. You can ignore the nfs-storage part, it doesn't really have to do anything with how the unikernel starts, but I'm using it so I can validate that any volumes specified in the Kubernetes template will be mounted in the rootfs which then will be available because 9pfs mounts the entire file system. Kubernetes template:
And the qemu command on the worker, you can see that I'm overriding the command to "sleep 3600" so I can go around and look at the config.json and state.json and other Kubernetes resources:
|
Hello @ciprian-barbu , I am glad that the above information were helpful. We need to improve our documentation for sure. Any suggestions are welcome. Regarding the linux image and the use of block instead of 9pfs, this is the correct behavior. The In the case of Unikraft, since we are unaware regarding its block support, If you do not want to use the block device as the rootfs and you prefer 9pfs even with Linux, then you can change the default snapshotter for
or simply remove it if the default snapshotter is overlayfs or any other non-block-based snapshotter. |
I have new details When running with the
I was surprised to see that unikernelType is But what is important is that when Then, at line 324, the rootFsDevice.FsType will return the type of driver used by the main partition, for me its is For unikraft unikernel type, supportsBlock returns false. and the behavior is different. I added some traces with fmt.PrintLn and I could see that for the But my problem is not how But I'm wondering if it is even possible to test mountRootFs feature using images with unikernelType set to Thank you, |
Hello @ciprian-barbu , you are correct. With the new release, TL;DR: You simply need to use a non-block based snapshotter (e.g. overlayfs). In unikraft, |
A simple implementation of using 9pfs to share volumes between different unikernels.
It uses a new annotation called "com.urunc.unikernel.ninePFSMntPoint" which points to the mount point inside the unikernel, thus identifying the volume specified by the user.
The source location on the host machine is automatically identified by urunc.