Skip to content

Mount Kubernetes volumes using 9pfs #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ciprian-barbu
Copy link

A simple implementation of using 9pfs to share volumes between different unikernels.
It uses a new annotation called "com.urunc.unikernel.ninePFSMntPoint" which points to the mount point inside the unikernel, thus identifying the volume specified by the user.
The source location on the host machine is automatically identified by urunc.

Copy link

netlify bot commented May 27, 2025

Deploy Preview for urunc canceled.

Name Link
🔨 Latest commit 12331ed
🔍 Latest deploy log https://app.netlify.com/projects/urunc/deploys/6835d3a4ac880100082f48f6

cmainas and others added 2 commits May 27, 2025 14:59
Signed-off-by: Charalampos Mainas <charalampos.mainas@gmail.com>
Implement a method of mounting local paths inside unikernels
(Unikraft only at this moment) based on a dedicated annotation.
This can be used in conjunction with NFS shared volumes in a Kubernetes
cluster, when a NFS provider has been configured.

The new annotation is called "com.urunc.unikernel.ninePFSMntPoint" and
it holds the value of mount point inside the unikernel, or in other
words, the value of "mountPath" specified by the volumeMount field of
the Kubernetes template.

The recommended usage is by setting it in the Pod specification of the
Kubernetes template.

In Unikontainer Exec method, urunc will search the list of mount points
passed from containerd and find a match for the mount point specified by
the annotation. Once found, urunc will be able to determine the local
path on the worker node of the PersistenVolume or PersistentVolumeClaim
and instruct qemu to mount it via 9pfs provider into the unikernel.

Signed-off-by: Ciprian Barbu <ciprian.barbu@thalesgroup.com>
@ciprian-barbu
Copy link
Author

ok-to-test

@ananos
Copy link
Contributor

ananos commented May 28, 2025

Hi @ciprian-barbu!

thanks for taking the time to craft this! @cmainas will follow up on the rationale of using the shared fs and we can take it from there!

Regardless, we are in the process of moving the repo to its own org, and we will need to refactor (remove/update) some parts of the CI. For instance a step is not succeeding due to secrets and github security limitations in accessing those secrets. Bear with us and we will provide steps to rebase your PR on top of a "fixed" CI workflow.

In the meantime please reach out to us if you need any additional info.

thanks again for your contribution, looking forward to getting it merged!

@cmainas
Copy link
Contributor

cmainas commented May 29, 2025

Hello @ciprian-barbu ,

thank you for opening this PR. Indeed mounting volumes to the unikernel is necessary for a lot of use cases. We were planning to add support for shared-fs in the upcoming release, so your PR comes in a very good timing too. The approach in the PR looks good and except of some minor comments, merging it would not be a problem. It would be nice though if we could talk over 2 points: a) shared-fs security and b) scalability of the proposed approach.

Security:
We were intentionally delaying support for shared-fs due to security concerns. Sharing data between the host and guest weakens isolation boundaries and increases the potential attack surface. However, we recognize that certain workloads may require access to host data, making host-guest file sharing necessary in some scenarios. To reduce host exposure and limit the guest’s access to the host filesystem, we prioritized implementing chroot support before enabling shared-fs. This approach ensures the monitor operates within an isolated root filesystem, making potential escapes (and harming the host rootfs) more difficult. I'm currently working on this implementation and hopefully it will be merged in the upcoming days.

Would it be ok to wait for adding support for chroot first, before we merge this PR? Unfortunately, both changes touch a few common lines and rebase might require some effort, but I will be happy to help with it.

Scalability:
Regarding scalability, and please correct me if I’ve misunderstood your PR, it looks like the proposed approach currently supports only a single mount point. While that’s perfectly fine for initial support, I’m curious how challenging it would be to extend this to support multiple mount points. A few approaches on that:

  1. Instead of mounting individual directories, we could share the entire container root filesystem and mount it as the rootfs of the unikernel. This was actually our original idea, since the unikernel may need access to data inside the container image itself. Additionally, any volumes mounted into the container would also become accessible from the unikernel.
  2. Instead of using the annotation to specify the mount point inside the unikernel, we could use it to instruct urunc to mount all the bind mounts. Therefore, urunc will traverse u.Spec.Mounts and store all bind mounts in mountSpecs.

Please let me know if I did not understand correctly tour approach. Also, what do you think about the above two approaches?

Thank you again for the effort and your contribution.

P.S.: I think the runners should be ok now with the repo transfer. Therefore, rerunning the tests should work (hopefully).

Edit: For re-running the tests, you will also need to rebase the PR over the main branch.

@ciprian-barbu
Copy link
Author

Hello,

See my responses below

Hello @ciprian-barbu ,

thank you for opening this PR. Indeed mounting volumes to the unikernel is necessary for a lot of use cases. We were planning to add support for shared-fs in the upcoming release, so your PR comes in a very good timing too. The approach in the PR looks good and except of some minor comments, merging it would not be a problem. It would be nice though if we could talk over 2 points: a) shared-fs security and b) scalability of the proposed approach.

Security: We were intentionally delaying support for shared-fs due to security concerns. Sharing data between the host and guest weakens isolation boundaries and increases the potential attack surface. However, we recognize that certain workloads may require access to host data, making host-guest file sharing necessary in some scenarios. To reduce host exposure and limit the guest’s access to the host filesystem, we prioritized implementing chroot support before enabling shared-fs. This approach ensures the monitor operates within an isolated root filesystem, making potential escapes (and harming the host rootfs) more difficult. I'm currently working on this implementation and hopefully it will be merged in the upcoming days.

Would it be ok to wait for adding support for chroot first, before we merge this PR? Unfortunately, both changes touch a few common lines and rebase might require some effort, but I will be happy to help with it.

Sure, it is fine with me, you are the maintainers, and our timeline in the project is not that tight. I haven't thought about the security aspects, so anything you propose is fine with me.

Scalability: Regarding scalability, and please correct me if I’ve misunderstood your PR, it looks like the proposed approach currently supports only a single mount point. While that’s perfectly fine for initial support, I’m curious how challenging it would be to extend this to support multiple mount points. A few approaches on that:

Indeed, I initially thought about allowing multiple volume mounts, but I got stuck at some point and decided to create a PR to ask for comments. I will look into this option, maybe the new annotation can point to a list of strings, instead of just one.

  1. Instead of mounting individual directories, we could share the entire container root filesystem and mount it as the rootfs of the unikernel. This was actually our original idea, since the unikernel may need access to data inside the container image itself. Additionally, any volumes mounted into the container would also become accessible from the unikernel.
  2. Instead of using the annotation to specify the mount point inside the unikernel, we could use it to instruct urunc to mount all the bind mounts. Therefore, urunc will traverse u.Spec.Mounts and store all bind mounts in mountSpecs.

I think it's ok, do you have some work in progress I can look at?
The u.Spec.Mounts specifies a lot of mounts, inherited from containerd, which make sense for a classic container. Thus my idea of introducing the annotation to specify which mount spec needs to be mounted inside the unikernel.

Please let me know if I did not understand correctly tour approach. Also, what do you think about the above two approaches?

The idea was to send the change for comments, which you provided plentiful, so thank you!
Should I continue with this PR, or do you prefer you to work on it yourself, given you have the plan for shared-fs either way?
It's fine with me either way.

Thank you again for the effort and your contribution.

P.S.: I think the runners should be ok now with the repo transfer. Therefore, rerunning the tests should work (hopefully).

Edit: For re-running the tests, you will also need to rebase the PR over the main branch.

Best regards,
Ciprian

@cmainas
Copy link
Contributor

cmainas commented May 29, 2025

Hello,

  1. Instead of mounting individual directories, we could share the entire container root filesystem and mount it as the rootfs of the unikernel. This was actually our original idea, since the unikernel may need access to data inside the container image itself. Additionally, any volumes mounted into the container would also become accessible from the unikernel.
  2. Instead of using the annotation to specify the mount point inside the unikernel, we could use it to instruct urunc to mount all the bind mounts. Therefore, urunc will traverse u.Spec.Mounts and store all bind mounts in mountSpecs.

I think it's ok, do you have some work in progress I can look at? The u.Spec.Mounts specifies a lot of mounts, inherited from containerd, which make sense for a classic container. Thus my idea of introducing the annotation to specify which mount spec needs to be mounted inside the unikernel.

That is correct, but typically, volumes are passed as bind mount entries in the container configuration. Therefore, I would vote to base on your PR and instead of using the annotation to define the mount point, check the mount type of each entry and if it is a bind mount store it in mountSpecs. However, we can still use the annotation to enable/disable such mounting. For instance, if a unikernel is built without 9pfs support, trying to pass a 9pfs device might result to a fail. In that scenario, users should have the option to disable mounting (and vice versa).

Please let me know if I did not understand correctly tour approach. Also, what do you think about the above two approaches?

The idea was to send the change for comments, which you provided plentiful, so thank you! Should I continue with this PR, or do you prefer you to work on it yourself, given you have the plan for shared-fs either way? It's fine with me either way.

I think we can keep working in this PR. Even a single mount point is a good starting point. If you want, you can further extend it to multiple mount points. It would also be nice to provide an example in the documentation. A small tutorial with an example deployment would be enough. I will ping here as soon as chroot support is merged.

Kind regards,
Babis

@cmainas
Copy link
Contributor

cmainas commented Jun 6, 2025

Hello @ciprian-barbu ,

we finally have support for the creation and changing rootfs before the execution of the monitor process (see #187). I have also created the mount_vol branch, which performs the necessary mounts for the bind mounts in the container's configuration. The branch needs much more work, but it seems to work.

So, it would be nice if you rebase your branch over mount_vol branch and if everything works well, we will be happy to merge your PR.

Edit: You should not need the first commit with the quick fix for network namespace (12df4aaa2fdc2), since we overcame the issue with Go and namespaces.

Edit 2: The code looks fine to me. I would vote though for a change of the name in annotation (e.g. sharedVolumeMntPoint (?)). Except of 9pfs there are other ways to have a shared directory between the host and the guest, hence we can use the same annotation for such cases (e.g. virtio-fs).

@ciprian-barbu
Copy link
Author

Hello,

Thank you for the update. I had a quick look at the monitor process feature, but I didn't have time to dive too deep into it.
I also looked at the mount_vol branch and I will use it as a starting point.

I'm still a bit puzzled whether the volumes specified through Kubernetes should appear in the unikernel runtime, so I need to test it out a bit and get a good understanding of what is actually happening.

I will get back with details when I'm ready.

BR,
Ciprian

@ciprian-barbu
Copy link
Author

Hi,

I noticed that qemu is now called from a hardcoded path of /usr/share/qemu. Furthermore, if I try to build and install urunc manually, and then run a Pod, I will get error about it:
failed to stat file /usr/share/qemu: no such file or directory

I noticed there is a deployment manifest which would automatically install these dependencies, but is it production ready? Am I supposed to install urunc that way?

BR,
Ciprian

@ananos
Copy link
Contributor

ananos commented Jun 10, 2025

Hi @ciprian-barbu!

indeed, we are in the process of refactoring the way we use the underlying hypervisors. Towards this effort, we wouldn't like to assume where the user has placed the qemu binary artifacts. This is the reason we have added a new function that looks for the artifacts in /usr/local/share, and it fallbacks to /usr/share/qemu. One option is to rebase once more over mount_vol (just added the extra commit with this functionality and force-pushed the barnch), or just use qemu from your distro's package manager.

Regarding urunc-deploy we have tested its functionality in EKS/local k3s and k8s clusters. You can try using this as well (for the initial installation) and then go through the make / make install process to change the urunc-specific binaries with your changes.

let us know how it goes! thanks again for sharing your findings and taking the time to play with urunc!

@ciprian-barbu
Copy link
Author

Hello,

I spent most of today to test the latest urunc and the code in mount_vol.
My main environment is on a RedHat based k3s, which makes it a bit difficult to get qemu with all the needed requirements (I'm also isolated from internet, which is a headache).

But eventually I pulled the docker image for urunc-deploy, and copied the resources from there and onto my worker node. I will have to test with the actual deployment manifest, because it is interesting for me too, it will spare me the work to deploy urunc in a controlled manner.

Looking at mount_vol, I got into issues because urunc was trying to mount regular files (e.g. etc-hosts and resolv.conf) using the unix.Mount functionality in go. It looks like urunc might need to consider what to do with these Kubernetes maintained resources, which is not ideal. Perhaps the mounts of type "bind" which are not directories could be simply ignored for now.

I need to test a bit more what happens with Kubernetes volumes mounted this way, for instance I test with the nfs-provider which creates PersistentVolumeClaims, which get mounted in something like this (excerpt from config.json)

{
  "destination": "/mnt/9pfs",
  "type": "bind",
  "source": "/var/lib/kubelet/pods/b9cb86f8-6d4c-4cf7-a7bc-dbf9a549fb69/volumes/kubernetes.io~nfs/pvc-65fedcd1-f70c-433f-8f21-55141d216468",
  "options": [
    "rbind",
    "rprivate",
    "rw"
  ]
}

I was expecting the pvc to get mounted in the rootfs location, but for some reason only the 'mkdir' part seems to work, and the mount seems to fail. So I need to check what is wrong and come back with some more details.

BR,
Ciprian

@ananos
Copy link
Contributor

ananos commented Jun 11, 2025

I spent most of today to test the latest urunc and the code in mount_vol. My main environment is on a RedHat based k3s, which makes it a bit difficult to get qemu with all the needed requirements (I'm also isolated from internet, which is a headache).

ouch, that's a bit tricky -- bear with us for a little while, we plan to distribute supported hypervisors as statically built binaries alongside urunc so you could have a single tar you could download and unpack on a node.

But eventually I pulled the docker image for urunc-deploy, and copied the resources from there and onto my worker node. I will have to test with the actual deployment manifest, because it is interesting for me too, it will spare me the work to deploy urunc in a controlled manner.

that's the rationale of urunc-deploy, taken from an EKS-specific use-case ;)

Looking at mount_vol, I got into issues because urunc was trying to mount regular files (e.g. etc-hosts and resolv.conf) using the unix.Mount functionality in go. It looks like urunc might need to consider what to do with these Kubernetes maintained resources, which is not ideal. Perhaps the mounts of type "bind" which are not directories could be simply ignored for now.

I'm not sure I understand -- IIRC it is possible to bind mount these files. The issue you mention about k8s is what we have been looking into for the kata-containers case as well. In principle, we can copy the files in the pivoted rootfs. However, if these files change, we would have to re-trigger the copy, which is something that is not feasible with urunc (so we will lose the k8s-specific functionality of secret rollout etc.). We will have to think about how to handle these.

I need to test a bit more what happens with Kubernetes volumes mounted this way, for instance I test with the nfs-provider which creates PersistentVolumeClaims, which get mounted in something like this (excerpt from config.json)
[snipped]
I was expecting the pvc to get mounted in the rootfs location, but for some reason only the 'mkdir' part seems to work, and the mount seems to fail. So I need to check what is wrong and come back with some more details.

sounds like a bug -- NFS-mounted dirs can be bind mount and we should be able to handle it properly. Perhaps it has something to do with the flags as we're being quite restrictive on that.

@ciprian-barbu
Copy link
Author

ciprian-barbu commented Jun 12, 2025

I spent most of today to test the latest urunc and the code in mount_vol. My main environment is on a RedHat based k3s, which makes it a bit difficult to get qemu with all the needed requirements (I'm also isolated from internet, which is a headache).

ouch, that's a bit tricky -- bear with us for a little while, we plan to distribute supported hypervisors as statically built binaries alongside urunc so you could have a single tar you could download and unpack on a node.

But eventually I pulled the docker image for urunc-deploy, and copied the resources from there and onto my worker node. I will have to test with the actual deployment manifest, because it is interesting for me too, it will spare me the work to deploy urunc in a controlled manner.

that's the rationale of urunc-deploy, taken from an EKS-specific use-case ;)

Looking at mount_vol, I got into issues because urunc was trying to mount regular files (e.g. etc-hosts and resolv.conf) using the unix.Mount functionality in go. It looks like urunc might need to consider what to do with these Kubernetes maintained resources, which is not ideal. Perhaps the mounts of type "bind" which are not directories could be simply ignored for now.

I'm not sure I understand -- IIRC it is possible to bind mount these files. The issue you mention about k8s is what we have been looking into for the kata-containers case as well. In principle, we can copy the files in the pivoted rootfs. However, if these files change, we would have to re-trigger the copy, which is something that is not feasible with urunc (so we will lose the k8s-specific functionality of secret rollout etc.). We will have to think about how to handle these.

Regardless of whether it works or not, the current implementation in mount_vol will first try to os.MkdirAll(dstPath, 0755), as if the source is always a directory. So this needs to be fixed.

I need to test a bit more what happens with Kubernetes volumes mounted this way, for instance I test with the nfs-provider which creates PersistentVolumeClaims, which get mounted in something like this (excerpt from config.json)
[snipped]
I was expecting the pvc to get mounted in the rootfs location, but for some reason only the 'mkdir' part seems to work, and the mount seems to fail. So I need to check what is wrong and come back with some more details.

sounds like a bug -- NFS-mounted dirs can be bind mount and we should be able to handle it properly. Perhaps it has something to do with the flags as we're being quite restrictive on that.

I have also tested with a local PersistentVolumeClaim, and I still don't get the mount showing up. In fact, my unikernel lists all the files and directories on the rootfs, and for some reason not even the mount destination dir doesn't show up. I'm still investigating, not sure what is wrong.

@ananos
Copy link
Contributor

ananos commented Jun 12, 2025

Looking at mount_vol, I got into issues because urunc was trying to mount regular files (e.g. etc-hosts and resolv.conf) using the unix.Mount functionality in go. It looks like urunc might need to consider what to do with these Kubernetes maintained resources, which is not ideal. Perhaps the mounts of type "bind" which are not directories could be simply ignored for now.

I'm not sure I understand -- IIRC it is possible to bind mount these files. The issue you mention about k8s is what we have been looking into for the kata-containers case as well. In principle, we can copy the files in the pivoted rootfs. However, if these files change, we would have to re-trigger the copy, which is something that is not feasible with urunc (so we will lose the k8s-specific functionality of secret rollout etc.). We will have to think about how to handle these.

Regardless of whether it works or not, the current implementation in mount_vol will first try to os.MkdirAll(dstPath, 0755), as if the source is always a directory. So this needs to be fixed.

aha! now I got it :D We'll fix it ASAP, I'll let you know so you can rebase over the branch.

I have also tested with a local PersistentVolumeClaim, and I still don't get the mount showing up. In fact, my unikernel lists all the files and directories on the rootfs, and for some reason not even the mount destination dir doesn't show up. I'm still investigating, not sure what is wrong.

We'll need a bit of time to investigate this as we're currently lacking CPU cycles. You'll probably find it first, but in any case if we get a chance to test I'll let you know as well.

thanks again for taking the time to look into urunc!

@ciprian-barbu
Copy link
Author

aha! now I got it :D We'll fix it ASAP, I'll let you know so you can rebase over the branch.

Sure, take your time, I'm experimenting with my own fork for now.
For reference, the unikernel code that I'm using comes from one of my colleagues who opened a ticket previously, see this reply:
#135 (comment)

For the other problem, looking at the qemu command I don't see an obvious parameter which specifies the pivoted rootfs passed as an argument, which is why I think the unikernel sees a different view of the rootfs than what is prepared by urunc:

root 13119 0.5 0.2 735256 94816 ? Ssl 11:20 0:27 /usr/local/bin/qemu-system-x86_64 -m 512M -L /usr/share/qemu -cpu host -enable-kvm -nographic -vga none -kernel /unikraft/bin/kernel -net nic,model=virtio -net tap,script=no,downscript=no,ifname=tap0_urunc -append /unikernel/main.qemu env.vars=[ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=test-urunc-pivot-nfs KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT=tcp://10.43.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1 KUBERNETES_SERVICE_HOST=10.43.0.1 ] netdev.ip=10.42.1.144/24:10.42.1.247:8.8.8.8 --

@ananos
Copy link
Contributor

ananos commented Jun 12, 2025

aha! now I got it :D We'll fix it ASAP, I'll let you know so you can rebase over the branch.

Sure, take your time, I'm experimenting with my own fork for now. For reference, the unikernel code that I'm using comes from one of my colleagues who opened a ticket previously, see this reply: #135 (comment)

For the other problem, looking at the qemu command I don't see an obvious parameter which specifies the pivoted rootfs passed as an argument, which is why I think the unikernel sees a different view of the rootfs than what is prepared by urunc:

root 13119 0.5 0.2 735256 94816 ? Ssl 11:20 0:27 /usr/local/bin/qemu-system-x86_64 -m 512M -L /usr/share/qemu -cpu host -enable-kvm -nographic -vga none -kernel /unikraft/bin/kernel -net nic,model=virtio -net tap,script=no,downscript=no,ifname=tap0_urunc -append /unikernel/main.qemu env.vars=[ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=test-urunc-pivot-nfs KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT=tcp://10.43.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1 KUBERNETES_SERVICE_HOST=10.43.0.1 ] netdev.ip=10.42.1.144/24:10.42.1.247:8.8.8.8 --

Can you check the latest mount_vol branch. The k8s file issue should be now fixed. Tried with this image: harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest

with the following yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-fc-deployment
spec:
  selector:
    matchLabels:
      app: nginx-fc
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-fc
    spec:
      runtimeClassName: urunc
      containers:
      - name: nginx-fc
        image: harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest
        ports:
        - containerPort: 80

the pod gets spawned correctly:

Name:                nginx-fc-deployment-6dd75bbd86-ckfjq
Namespace:           default
Priority:            0
Runtime Class Name:  urunc
Service Account:     default
Node:                k8s-urunc/192.168.11.33
Start Time:          Thu, 12 Jun 2025 11:50:58 +0000
Labels:              app=nginx-fc
                     pod-template-hash=6dd75bbd86
Annotations:         cni.projectcalico.org/containerID: 654eb064ffd794bc18677e13b3357eb7cf5fbb5be7ccaa4b85dc4775d1b165cc
                     cni.projectcalico.org/podIP: 10.244.39.67/32
                     cni.projectcalico.org/podIPs: 10.244.39.67/32
Status:              Running
IP:                  10.244.39.67
IPs:
  IP:           10.244.39.67
Controlled By:  ReplicaSet/nginx-fc-deployment-6dd75bbd86
Containers:
  nginx-fc:
    Container ID:   containerd://7bf0015240112d6e16e2a7a8977cbf88d261e4ab813a03861bbf623ff3465de3
    Image:          harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest
    Image ID:       harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd@sha256:368c58c2a84cca7a1b3ed6115d0bbe865d584b1b17acc91a5b59704942aa8976
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 12 Jun 2025 11:50:59 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vtppp (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-vtppp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  16s   default-scheduler  Successfully assigned default/nginx-fc-deployment-6dd75bbd86-ckfjq to k8s-urunc
  Normal  Pulling    16s   kubelet            Pulling image "harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest"
  Normal  Pulled     15s   kubelet            Successfully pulled image "harbor.nbfc.io/nubificus/urunc/nginx-firecracker-unikraft-initrd:latest" in 227ms (227ms including waiting). Image size: 913654 bytes.
  Normal  Created    15s   kubelet            Created container: nginx-fc
  Normal  Started    15s   kubelet            Started container nginx-fc

and the resulting pivoted rootfs is the following:

# tree -R /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7bf0015240112d6e16e2a7a8977cbf88d261e4ab813a03861bbf623ff3465de3/rootfs
/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7bf0015240112d6e16e2a7a8977cbf88d261e4ab813a03861bbf623ff3465de3/rootfs
├── dev
├── etc
│   ├── hostname
│   ├── hosts
│   └── resolv.conf
├── tmp
├── unikernel
│   ├── initrd
│   └── nginx_fc-x86_64
├── urunc.json
├── usr
│   └── local
│       └── bin
│           └── firecracker
└── var
    └── run
        └── secrets
            └── kubernetes.io
                └── serviceaccount

The same should stand for QEMU. AFAIU there's no parameter needed for the pivoted rootfs to passed from QEMU, as the monitor is spawned from the pivoted rootfs. So everything should be relative to the original pivoted path (maybe I'm wrong here).

@ciprian-barbu
Copy link
Author

Hi,

I tested this new version and it works fine, I see that the hosts, hostname and resolv.conf files are copied in the pivoted rootfs.

The same should stand for QEMU. AFAIU there's no parameter needed for the pivoted rootfs to passed from QEMU, as the monitor is spawned from the pivoted rootfs. So everything should be relative to the original pivoted path (maybe I'm wrong here).

I think you are wrong here, qemu is essentially a machine emulator, and you need some sort of emulated device where the rootfs exists. That is why there are backends like virtiofsd or 9pfs, to help pass an unstructured location on the host in the guest, presented as some sort of real (but emulated) device.

With the unikernel that I used in my tests, which lists all files in the filesystem, I still couldn't see the pivoted rootfs contents in the output. So I modified the code a bit to mount "/" from the host (which is correct because urunc performs chroot) inside the unikernel in "/mnt". Mounting it in "/" in the guest creates other problems, as the kernel is not present in the pivoted rootfs.
But at least I could see that the mounted volume exists in the listsing, even if it is on a nfs mount.

On the other hand, I was looking on the host on the pivoted rootfs location and the mount was not present, but this is because urunc pivots into a different namespace, so it's not supposed to show up in the global namespace.

It looks like there is some more work needed, I will wait until you get some more time to work on it.

BR,
Ciprian

@ciprian-barbu
Copy link
Author

ciprian-barbu commented Jun 13, 2025

Here are my resources:

> cat urunc/test-urunc-pivot-nfs.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-urunc-pivot-nfs
spec:
  runtimeClassName: urunc
  #nodeName: ciprian-node2
  volumes:
    - name: nfs-storage
      persistentVolumeClaim:
        claimName: nfsclaim
  containers:
    - name: unikernel-main
      imagePullPolicy: Always # Prevents breaking from crictl prune
      resources:
        limits:
          memory: "512M"
      image: printfilesrecursive-ocipun-cpp:test-loop
      command: [ '/unikernel/main.qemu' ]
      volumeMounts:
        - name: nfs-storage
          mountPath: /mnt/9pfs
Name:                test-urunc-pivot-nfs
Namespace:           default
Priority:            0
Runtime Class Name:  urunc
Service Account:     default
Node:                k8s-node09/10.188.13.149
Start Time:          Fri, 13 Jun 2025 12:34:38 +0300
Labels:              <none>
Annotations:         <none>
Status:              Running
IP:                  10.42.1.167
IPs:
  IP:  10.42.1.167
Containers:
  unikernel-main:
    Container ID:  containerd://4e7ab1f66afb98a4c65b9007a68e07b18ba6e6256696cf2bb575059b29965871
    Image:        ocipun-cpp:test-loop
    Image ID:      printfilesrecursive-ocipun-cpp@sha256:b2a82291e83481a22925a9242689d16ef137bf41f14a8b4aff7b48cca2190cd0
    Port:          <none>
    Host Port:     <none>
    Command:
      /unikernel/main.qemu
    State:          Running
      Started:      Fri, 13 Jun 2025 12:34:41 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  512M
    Requests:
      memory:     512M
    Environment:  <none>
    Mounts:
      /mnt/9pfs from nfs-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jjhm9 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  nfs-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nfsclaim
    ReadOnly:   false
  kube-api-access-jjhm9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 15s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 30s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
# tree -R /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/4e7ab1f66afb98a4c65b9007a68e07b18ba6e6256696cf2bb575059b29965871
/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/4e7ab1f66afb98a4c65b9007a68e07b18ba6e6256696cf2bb575059b29965871
├── address
├── config.json
├── init.pid
├── log
├── log.json
├── options.json
├── rootfs
│   ├── dev
│   ├── etc
│   │   ├── hostname
│   │   ├── hosts
│   │   └── resolv.conf
│   ├── lib
│   ├── lib64
│   ├── mnt
│   │   └── 9pfs
│   ├── tmp
│   ├── unikraft
│   │   └── bin
│   │       └── kernel
│   ├── urunc.json
│   ├── usr
│   │   ├── lib
│   │   ├── local
│   │   │   └── bin
│   │   │       └── qemu-system-x86_64
│   │   └── share
│   │       ├── qemu
│   │       └── seabios
│   └── var
│       └── run
│           └── secrets
│               └── kubernetes.io
│                   └── serviceaccount
├── runtime
├── shim-binary-path
└── work -> /var/lib/rancher/k3s/agent/containerd/io.containerd.runtime.v2.task/k8s.io/4e7ab1f66afb98a4c65b9007a68e07b18ba6e6256696cf2bb575059b29965871

23 directories, 14 files

I can see /mnt/9pfs in the listing, but no files in it. However, the unikernel lists the following:

Booting from ROM..^@^@1: Set IPv4 address 10.42.1.167 mask 255.255.255.0 gw 10.42.1.247
en1: Added
en1: Interface is up
Powered by Unikraft Helene (0.18.0~9e36492)
Command-line arguments:
Argument 0: init

Environment variables:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib
HOME=/
HOSTNAME=test-urunc-pivot-nfs
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1
KUBERNETES_SERVICE_HOST=10.43.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT=tcp://10.43.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443
KUBERNETES_PORT_443_TCP_PROTO=tcp
Listing files in directory: "/"
[DIR] "/etc"
[FILE] "/etc/ld.so.cache"
[FILE] "/etc/resolv.conf"
[FILE] "/etc/hosts"
[FILE] "/init"
[DIR] "/lib"
[DIR] "/lib/x86_64-linux-gnu"
[FILE] "/lib/x86_64-linux-gnu/libc.so.6"
[FILE] "/lib/x86_64-linux-gnu/libm.so.6"
[DIR] "/lib64"
[FILE] "/lib64/ld-linux-x86-64.so.2"
[DIR] "/shared"
[FILE] "/shared/input.in"
[DIR] "/usr"
[DIR] "/usr/local"
[DIR] "/usr/local/lib64"
[FILE] "/usr/local/lib64/libgcc_s.so.1"
[FILE] "/usr/local/lib64/libstdc++.so.6"
[DIR] "/mnt"
[DIR] "/mnt/mnt"
qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
[DIR] "/mnt/mnt/9pfs"
[FILE] "/mnt/mnt/9pfs/test123.txt"
[FILE] "/mnt/mnt/9pfs/test456.txt"
[FILE] "/mnt/mnt/9pfs/another-file.txt"
[FILE] "/mnt/mnt/9pfs/out.txt"
[DIR] "/mnt/etc"
[FILE] "/mnt/etc/hosts"
[FILE] "/mnt/etc/hostname"
[FILE] "/mnt/etc/resolv.conf"
[DIR] "/mnt/lib"
[DIR] "/mnt/lib/debug"
[DIR] "/mnt/lib/debug/usr"
[DIR] "/mnt/lib/debug/usr/bin"
[DIR] "/mnt/lib/debug/usr/sbin"
[DIR] "/mnt/lib/debug/usr/lib"
[DIR] "/mnt/lib/debug/usr/lib64"
Error: filesystem error: status: Too many levels of symbolic links [/mnt/lib/debug/usr/.dwz]

So you can see that the unikernel sees the correct contents in /mnt/mnt/9pfs, although on the host nothing is shown. Which is because, I think, the mounting is not present in the default namespace on the host, but in the new namespace.

@ananos
Copy link
Contributor

ananos commented Jun 13, 2025

thanks for info @ciprian-barbu and the detailed explanation!

Indeed, the mount_vol branch what it does is to bring the spec-related volumes into the rootfs of the container (where the monitor will then spawn the unikernel). For the unikernel to see these contents, we need to add parameters to the monitors to allow this (depending on the unikernel framework and/or the monitor).

There is ongoing work for that, tracked here: #109 (and it is essentially the only issue pending for the v0.6.0 release :D). This is work that @ cmainas has been doing and it should be ready by the end of this month.

You can check this commit for a quick&dirty solution: https://github.com/urunc-dev/urunc/tree/just_sharedfs but I suspect what you have already done in this branch is similar.

AFAIU you would like the contents to be available in the unikernel right? So with the current hacks this is feasible right? or you would like the contents to be available in the pivoted rootfs as well?

I'll have to get back to you about the namespace issue you mention because I will have to look into it.

@ciprian-barbu
Copy link
Author

You can check this commit for a quick&dirty solution: https://github.com/urunc-dev/urunc/tree/just_sharedfs but I suspect what you have already done in this branch is similar.

Indeed, this was the source of inspiration for all the work around this pull request.
And of course, the point is to have the volume available inside the unikernel environment, one should not care about what urunc does behind the scenes, and it actually makes a lot of sense for the mounts to not be seen in the default namespace.

If I understand correctly, with the shared fs option there will be 2 choices to run the unikernel:

  • the user specifies the initrd, either from the K8s manifest or in the unikernel itself. This would cause urunc to behave the "normal" way, in which case the pivoted rootfs might be redundant
  • the user doesn't specify the initrd, and urunc will decide to mount it via 9pfs from the location of the pivoted rootfs. In this case I'm wondering if there will be a choice between 9pfs and virtiofsd, or 9pfs will be the only option.

For now I will wait for the work in progress to be merged, and then we can discuss what should happen with this PR.

BR,
Ciprian

@cmainas
Copy link
Contributor

cmainas commented Jun 25, 2025

Hello @ciprian-barbu ,

we have now merged the work in the mount_vol branch, so please rebase over the main branch. You are correct regarding the mount namespace -- you need to enter that namespace to get a view of the mounts.

For shared-fs, initially we had the following plan. If the guest requires access to files (either in the container's rootfs or from volumes), then urunc checking the respective annotation (mountRootfs -- WIP) will try to mount the container's rootfs and other volumes in the guest. The way urunc will try to do that, depends on the snapshotter in use and the support of each guest in block devices/shared-fs. If devmapper is the snapshotter, then urunc will try to use directly the container's image snapshot as a block device (if the guest supports block devices). Otherwise, if the guest supports 9pfs or virtio-fs, then urunc will try to share the container's rootfs with the guest over 9pfs or virtio-fs.

Of course there are use cases where storage is not required and in these scenarios the annotation will be absent and urunc will not have to do anything regarding the guest's rootfs. Yet, the chroot is an extra small step to restrict the access to the host.

In this PR, you suggested another approach, which totally makes sense and seems quite useful. Therefore, both approaches (mounting whole container's rootfs and mounting specific volumes) can co-exist in urunc and depending on the use case, the users can choose the most suitable one. Therefore, we will be happy to merge this PR.

@ciprian-barbu
Copy link
Author

Hi,

Sorry for the delay, I was busy with other stuff.
It's good news that you merged mount_vol, but I would also like to see the WIP you mentioned for mountRootfs, can you give me a link to a branch or pull request?

Ideally, if that WIP will result in the possibility to automatically mount the volumes through any method, I would be happy to drop this PR. If indeed there is no other way to mount volumes on request, then I will go forward with rebasing and fixing the pending TODOs. I'm not sure that having two ways of achieving the same result is beneficial, so that is why I want to understand where the mount_vol work and mountRootfs will end up.

Thanks and regards,
Ciprian

@cmainas
Copy link
Contributor

cmainas commented Jun 27, 2025

Hello @ciprian-barbu ,

the PR for mounting the container;s rootfs through 9pfs in the guest is this one: #194

The PR indeed mounts the volumes inside the container's rootfs (and consequently the guest can access them). However, I would still argue that the two approaches can co-exist.

One argument would be performance. The more files we share through 9pfs the slower it gets. For instance, in a use-case where a service/app does not need access to the whole container rootfs, but only to a couple of files which are accessible at runtime (e.g. Kubernetes secrets) mounting every volume in the container's rootfs and then share it with the guest would not be that optimal.

Furthermore, there might be scenarios where setting the guest's rootfs as a shared-fs might not be supported by the guest. I am not sure what will happen in Unikraft if the unikernel already contains a pre-existing rootfs (e.g. embedded initrd) and then we try to mount the rootfs through 9pfs. We have a similar problem with Rumprun (but with block devices), which does not allow us to mount anything at "/". In such cases, the approach you propose will be useful.

@ciprian-barbu
Copy link
Author

ciprian-barbu commented Jul 14, 2025

Hello,

I'm sorry for the delayed response, I was out of office for a while and only managed to get back to this last week.

I started by experimenting with the latest main, and specifically the com.urunc.unikernel.mountRootfs annotation. But I'm having troubles with the Unikraft unikernels I'm using, which worked fine in the past using urunc from the branch just_sharedfs. Now, I get errors right after the BIOS boot menu:

SeaBIOS (version rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org)


iPXE (http://ipxe.org) 00:02.0 C000 PCI2.10 PnP PMM+1EFD0E30+1EF30E30 C000
Press Ctrl-B to configure iPXE (PCI 00:02.0)...^M


Booting from ROM..^@^@1: Set IPv4 address 10.42.1.128 mask 255.255.255.0 gw 10.42.1.247
en1: Added
en1: Interface is up
[    0.119543] ERR:  [libukboot] Init function at 0x159670 returned error -

From what I can tell, looking at the rootfs structure on the worker, the rootfs is not properly instantiated on the pivot chroot, and some necessary files are not present, e.g. /usr/local/lib64/libgcc_s.so.1 and /usr/local/lib64/libstdc++.so.6.

I'm not very familiar with how the OCI image should look like, so I started looking at the instructions about using bunny. However, this implies that the unikernel was already built, so I don't have a complete step-by-step routine for testing urunc with a known image.

I will try using one of the images on harbor.nbfc.io in the meantime, but it will be great if you can provide some details on how to properly package the unikernel.

Thanks and regards,
Ciprian

@cmainas
Copy link
Contributor

cmainas commented Jul 14, 2025

Hello @ciprian-barbu ,

of course. Let me first summarize the new steps that urunc performs before the execution of the monitor process.

In all cases, urunc will setup a new rootfs, chroot into it and then execute the monitor process. This new rootfs includes the necessary files for the monitor process (i.e. binary, shared libraries, data files) and the container's rootfs. Nothing else should be inside there.

In the case the image uses the mountRootfs annotation and the unikernels supports shared-fs (such as Unikraft), urunc will perform some extra steps:

  1. It will bind mount the container's image rootfs in a new directory inside the new rootfs.
  2. It will bind mount all volumes in the container's spec, inside the new directory of the container's rootfs.
  3. It will modify the unikernel's kernel cli argument to instruct it for 9pfs.
  4. It will append the qemu cli options for 9pfs, passing as a shared directory the directory inside the new rootfs where the container's image rootfs was mounted.

So, if the container image has set the mountRootfs annotation as true, then (and specifically for Unikraft), urunc will pass the container's image rootfs as a shared directory with 9pfs.

The fastest way to test out this would be to use the harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest image and simply execute it with docker or nerdctl. It will spawn a nginx, but you can set /bin/sh and it will spawn a shell inside the VM. For example

$ sudo nerdctl run --pull always  --rm -it  --runtime "io.containerd.urunc.v2" --name test harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest /bin/sh                                                                                          
harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest:                       resolved     
  |++++++++++++++++++++++++++++++++++++++|                                                     
manifest-sha256:26c404a45f4f508882c4310f8e566d34dc64194a2354734157de89a7c99c0c63: exists       
  |++++++++++++++++++++++++++++++++++++++|                                                     
config-sha256:50611f976401e7f432058120c6f41e6ec49e88049e488b35632371ab697247aa:   exists       
  |++++++++++++++++++++++++++++++++++++++|                                                     
elapsed: 2.2 s                                                                    total:   0.0 
B (0.0 B/s)                                                                                    
SeaBIOS (version 1.13.0-1ubuntu1.1)                                                            
                                                                                               
                                                                                               
iPXE (http://ipxe.org) 00:02.0 C000 PCI2.10 PnP PMM+0FF8C7D0+0FECC7D0 C000                     
                                                                                               
                                                                                               
                                                                                               
Booting from ROM..                                  
/bin/sh: can't access tty; job control turned off
~ # 
~ # 
~ # ls
bin                   media                 srv
dev                   mnt                   sys
docker-entrypoint.d   opt                   tmp
docker-entrypoint.sh  proc                  urunc.json
etc                   root                  urunit
home                  run                   usr
lib                   sbin                  var
~ # 
~ # 

In order to build an image with bunny and set up the annotation for an existing container image you can use both Containerfile-like syntax and bunnyfile.

With Containerfile:

#syntax=harbor.nbfc.io/nubificus/bunny:latest
FROM <existing-image>

LABEL "com.urunc.unikernel.binary"="<path-to-unikernel-binary-inside-the-image>"
LABEL "com.urunc.unikernel.cmdline"="<cmdline>"
LABEL "com.urunc.unikernel.unikernelType"="unikraft"
LABEL "com.urunc.unikernel.hypervisor"="qemu"
LABEL "com.urunc.unikernel.mountRootfs"="true"

With bunnyfile:

#syntax=harbor.nbfc.io/nubificus/bunny:latest
version: v0.1

platforms:
  framework: unikraft
  monitor: qemu
  architecture: x86

rootfs:
  from: <existing-image>
  type: raw

kernel:
  from: <existing-image>
  path: <path-to-unikernel-binary-inside-the-image>

cmdline: <cmdline>

Make sure to replace:

  • <existing-image>
  • <path-to-unikernel-binary-inside-the-image>
  • <cmdline>

Also, keep in mind that urunc does not support the combination of both initrd and mounting the container's rootfs as the guest's rootfs. Therefore, if the container image specifies an initrd (with the respective annotation), urunc will not share the container;s rootfs with the guest.

@ciprian-barbu
Copy link
Author

Hello,

Forgive me, but I think there is some context missing from here .. I spent the last few days looking at the bunny documentation, talked to Unikraft guys, and I still don't think I have all the picture.

I thought it would be easy to build a unikernel which I can use to test urunc with mountRootfs: "true", but this is really not the case.

According to the Unikraft developers, their kraftkit can only generate cpio rootfs, which like you said before cannot work with 9pfs. So instead I'm trying to find some instructions on building and packaging a Unikraft unikernel (I'm not familiar with other framewors) which can work. How do you exactly go about that?

Is it supposed to work something like this?

  1. kraft build --arch x86_64 --plat qemu
  2. get the generated initramfs-x86_64.cpio and unpack it somewhere: e..g cat initramfs-x86_64.cpio | cpio -idmv
  3. docker build -f bunnyfile -t image:tag .

Am I missing something? I get the following error:
qemu-system-x86_64: Failed to open file '/.boot/rootfs'

Thanks and regards,
Ciprian

@cmainas
Copy link
Contributor

cmainas commented Jul 23, 2025

Hello @ciprian-barbu ,

I think it is a bit simpler than your description. The only requirement is a Unikraft unikernel which supports 9pfs and can mount it as its rootfs. Let's try as an example a Unikraft Nginx unikernel that uses shared-fs for its rootfs.

  1. We will use nginx 1.25 from unikraft's catalog
  2. We need to modify Kraftfile in order to configure the unikernel to not automount an embedded initrd. The diff is the following one:
diff --git a/library/nginx/1.25/Kraftfile b/library/nginx/1.25/
Kraftfile
index dfde634..8b97cff 100644
--- a/library/nginx/1.25/Kraftfile
+++ b/library/nginx/1.25/Kraftfile
@@ -8,11 +8,11 @@ cmd: ["/usr/bin/nginx"]
 
 template:
   source: https://github.com/unikraft/app-elfloader.git
-  version: staging
+  version: stable
 
 unikraft:
   source: https://github.com/unikraft/unikraft.git
-  version: staging
+  version: stable
   kconfig:
     # Configurations options for app-elfloader
     # (they can't be part of the template atm)
@@ -31,6 +31,7 @@ unikraft:
     CONFIG_HAVE_PAGING_DIRECTMAP: 'y'
     CONFIG_HAVE_PAGING: 'y'
     CONFIG_I8042: 'y'
+    CONFIG_LIB9PFS: 'y'
     CONFIG_LIBDEVFS_AUTOMOUNT: 'y'
     CONFIG_LIBDEVFS_DEV_NULL: 'y'
     CONFIG_LIBDEVFS_DEV_STDOUT: 'y'
@@ -94,13 +95,12 @@ unikraft:
     CONFIG_LIBUKVMEM_DEMAND_PAGE_IN_SIZE: 12
     CONFIG_LIBUKVMEM_PAGEFAULT_HANDLER_PRIO: 4
     CONFIG_LIBUKVMEM: 'y'
-    CONFIG_LIBVFSCORE_AUTOMOUNT_CI: 'y'
-    CONFIG_LIBVFSCORE_AUTOMOUNT_CI_EINITRD: 'y'
     CONFIG_LIBVFSCORE_AUTOMOUNT_UP: 'y'
     CONFIG_LIBVFSCORE_AUTOMOUNT: 'y'
     CONFIG_LIBVFSCORE_NONLARGEFILE: 'y'
     CONFIG_LIBVFSCORE: 'y'
     CONFIG_LIBUK9P: 'y'
+    CONFIG_LIBUK9PFS: 'y'
     CONFIG_OPTIMIZE_DEADELIM: 'y'
     CONFIG_OPTIMIZE_LTO: 'y'
     CONFIG_PAGING: 'y'
@@ -117,7 +117,7 @@ unikraft:
 libraries:
   lwip:
     source: https://github.com/unikraft/lib-lwip.git
-    version: staging
+    version: stable
     kconfig:
       CONFIG_LWIP_LOOPIF: 'y'
       CONFIG_LWIP_UKNETDEV: 'y'
@@ -139,8 +139,7 @@ libraries:
       CONFIG_LWIP_ICMP: 'y'
   libelf:
     source: https://github.com/unikraft/lib-libelf.git
-    version: staging
+    version: stable
 
 targets:
-- fc/x86_64
 - qemu/x86_64
  1. We can then build it with kraft build
  2. Now we should pack it with bunny. A bunnyfile could be the following:
#syntax=harbor.nbfc.io/nubificus/bunny:latest
version: v0.1

platforms:
  framework: unikraft
  monitor: qemu
  architecture: x86

rootfs:
  from: nginx:1.25.3-bookworm
  type: raw

kernel:
  from: local
  path: nginx_qemu-x86_64

cmdline: "/bin/ls /"

As a rootfs, we choose the same image that is defined in the Dockerfile that kraft uses to build the initrd. But we could use any other image too. Also, keep in mind that I used /bin/ls as a cmdline, because the default configuration of nginx uses multiple processes and this fails in Unikraft.

  1. We can run the above image with urunc:
sudo docker run -m 512M --runtime "io.containerd.urunc.v2" --rm -it --name test nginx-qemu-unikraft-raw:test
SeaBIOS (version 1.13.0-1ubuntu1.1)


iPXE (http://ipxe.org) 00:02.0 C000 PCI2.10 PnP PMM+4308C7D0+42FCC7D0 C000
                                                                              


Booting from ROM..1: Set IPv4 address 172.17.0.2 mask 255.255.255.0 gw 172.17.0.1
en1: Added
en1: Interface is up
qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might !
Powered by Unikraft Pan (0.19.0~ecbcb2d)
bin   docker-entrypoint.d   home   media  proc  sbin  tmp         var
boot  docker-entrypoint.sh  lib    mnt    root  srv   urunc.json
dev   etc                   lib64  opt    run   sys   usr
[    0.823012] CRIT: [libvfscore] Assertion failure: dp->d_refcnt > 0
  1. In order to boot nginx, we will need to modify the rootfs image. In particular, we need to replace nginx's config file with the Unikraft's one and to remove some log files. Similar steps as in the Dockerfile that kraft uses.
FROM nginx:1.25.3-bookworm

COPY nginx.conf /etc/nginx/nginx.conf

RUN rm /var/log/nginx/error.log && rm /var/log/nginx/access.lo

We build the above with docker:

sudo docker build -f Dockerfile -t nginx/single/process/config .

and we use the new image as the rootfs in the bunnyfile. We cna also update the cmdline to /usr/sbin/nginx or just define it at runtime later.

  1. After rebuilding the image with the new rootfs, we can run it and nginx should work:
$ sudo docker run -m 1G --pull always --runtime "io.containerd.urunc.v2" --rm -it --name test nginx-qemu-unikraft-raw:dltme /usr/sbin/nginx

SeaBIOS (version 1.13.0-1ubuntu1.1)


iPXE (http://ipxe.org) 00:02.0 C000 PCI2.10 PnP PMM+4308C7D0+42FCC7D0 C000
                                                                             


Booting from ROM..1: Set IPv4 address 172.17.0.2 mask 255.255.255.0 gw 172.17.0.1
en1: Added
en1: Interface is up
qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might !
Powered by Unikraft Pan (0.19.0~ecbcb2d)
  1. We can also curl the nginx
$ curl 172.17.0.2
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.25.3</center>
</body>
</html>

which returns 404, since I forgot to add /wwwroot/index.html in the rootfs.

Of course the above method uses the whole nginx image as rootfs, but you can significantly reduce it, as Unikraft does in the respective Dockerfile. Hence, you could use the Dockerfile of the respective example to create a container image rootfs which can be later declared in bunnyfile. Alternatively, bunny can build this rootfs for you You can find an example in urunc's docs.

Also, playing around with Unikraft's configuration will eliminate the vfscore error shown above.

@ciprian-barbu
Copy link
Author

Thank you @cmainas very much for the details, this is exactly what I was looking for, very specific instructions for unikraft. I will try them and get back to you as soon as I can.

I was actually struggling to understand how Unikraft unikernels packages the images and how urunc expects them to look like, so this information will be very useful, perhaps they can go into the bunny documentation, for specific step-by-step howto with Unikraft.

I also want to mention that I was tried out the image you mentioned before harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest and I was surprised to see that the rootfs is specified as -drive format=raw,if=none,id=hd0,file=/dev/mapper/contaerd-pool-snap-129, but none of the 9pfs specific qemu parameters.

I'm still looking at the code and scracthing my head, but am I missing something?

Best regards,
Ciprian

@ciprian-barbu
Copy link
Author

ciprian-barbu commented Jul 23, 2025

Here are the details I used, I used this in my k3s envirionment. You can ignore the nfs-storage part, it doesn't really have to do anything with how the unikernel starts, but I'm using it so I can validate that any volumes specified in the Kubernetes template will be mounted in the rootfs which then will be available because 9pfs mounts the entire file system.

Kubernetes template:

apiVersion: v1
kind: Pod
metadata:
  name: test-nginx-qemu-linux-raw
  annotations:
    #com.urunc.unikernel.ninePFSMntPoint: "/mnt/9pfs"
    com.urunc.unikernel.mountRootfs: "true"
spec:
  runtimeClassName: urunc
  nodeName: ciprian-node2
  volumes:
    - name: nfs-storage
      persistentVolumeClaim:
        claimName: nfsclaim
  containers:
    - name: unikernel-main
      imagePullPolicy: Always # Prevents breaking from crictl prune
      resources:
        limits:
          memory: "256M"
      image: harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest
      command: [ '/bin/sleep' ]
      #args: [ '-c', 'find' ]
      args: [ '3600' ]
      volumeMounts:
        - name: nfs-storage
          mountPath: /mnt/9pfs

And the qemu command on the worker, you can see that I'm overriding the command to "sleep 3600" so I can go around and look at the config.json and state.json and other Kubernetes resources:

root       40405  2.1  2.1 745364 85752 ?        Ssl  13:07   0:00 /usr/local/bin/qemu-system-x86_64 -m 256M -L /usr/share/qemu -cpu host -enable-kvm -nographic -vga none -kernel /.boot/kernel -net nic,model=virtio -net tap,script=no,downscript=no,ifname=tap0_urunc -device virtio-blk-pci,id=blk0,drive=hd0 -drive format=raw,if=none,id=hd0,file=/dev/mapper/containerd-pool-snap-135 -no-reboot -serial stdio -nodefaults -append panic=-1 console=ttyS0 root=/dev/vda rw ip=10.42.5.57::10.42.5.1:255.255.255.0:urunc:eth0:off PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=test-nginx-qemu-linux-raw KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1 NGINX_SERVICE_HOST=10.43.157.29 NGINX_PORT_80_TCP_PROTO=tcp NGINX_PORT_80_TCP_PORT=80 KUBERNETES_PORT=tcp://10.43.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp NGINX_SERVICE_PORT=80 NGINX_PORT=tcp://10.43.157.29:80 NGINX_PORT_8080_TCP=tcp://10.43.157.29:8080 NGINX_PORT_8080_TCP_PROTO=tcp KUBERNETES_SERVICE_HOST=10.43.0.1 KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443 NGINX_PORT_80_TCP=tcp://10.43.157.29:80 NGINX_PORT_8080_TCP_PORT=8080 KUBERNETES_SERVICE_PORT=443 NGINX_SERVICE_PORT_METRICS=8080 NGINX_PORT_80_TCP_ADDR=10.43.157.29 NGINX_PORT_8080_TCP_ADDR=10.43.157.29 NGINX_SERVICE_PORT_HTTP=80 init=/bin/sleep -- 3600

@cmainas
Copy link
Contributor

cmainas commented Jul 23, 2025

Hello @ciprian-barbu ,

I am glad that the above information were helpful. We need to improve our documentation for sure. Any suggestions are welcome.

Regarding the linux image and the use of block instead of 9pfs, this is the correct behavior. The mountRootfs annotation is common for both block-based and 9pfs-based rootfs mounting. Moreover, mostly due to security reasons, we choose to give priority to block-based. Therefore, if the user has configured a block-based snapshotter (e.g. devmapper) for urunc and the unikernel/guest supports block devices, then urunc will use the container's image snapshot as a block device when spawning the VM. Unfortunately, this means that the extra volumes will not be present inside the VM (at least for the time being).

In the case of Unikraft, since we are unaware regarding its block support, urunc will choose the 9pfs solution.

If you do not want to use the block device as the rootfs and you prefer 9pfs even with Linux, then you can change the default snapshotter for urunc. To do this, assuming you have a similar containerd configuration as the one in the installation guide, you can replace the snapshotter line with the following one:

    snapshotter = "overlayfs"

or simply remove it if the default snapshotter is overlayfs or any other non-block-based snapshotter.

@ciprian-barbu
Copy link
Author

I have new details

When running with the harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest image, I get the following:

# ls /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/c9bdd825e5c617b61c7cc70a971019a0a505f16bdae1395de6b0f787f8ab0606
state.json

# cat state.json
{
  "ociVersion": "1.2.0",
  "id": "d0a43e8a91ba87019ace9348beda78d2765b8442fb55ae3fbbd9f1c9c59b13c1",
  "status": "running",
  "pid": 46706,
  "bundle": "/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/d0a43e8a91ba87019ace9348beda78d2765b8442fb55ae3fbbd9f1c9c59b13c1",
  "annotations": {
    "com.urunc.unikernel.binary": "/.boot/kernel",
    "com.urunc.unikernel.cmdline": "/urunit /usr/sbin/nginx -g 'daemon off;error_log stderr debug;",
    "com.urunc.unikernel.hypervisor": "qemu",
    "com.urunc.unikernel.mountRootfs": "true",
    "com.urunc.unikernel.unikernelType": "linux",
    "io.kubernetes.cri.container-name": "unikernel-main",
    "io.kubernetes.cri.container-type": "container",
    "io.kubernetes.cri.image-name": "harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest",
    "io.kubernetes.cri.sandbox-id": "444b959ec8796c5ec28949d92442f39a4f49bc9c558ce5af2f4be5eeaaee64da",
    "io.kubernetes.cri.sandbox-name": "test-nginx-qemu-linux-raw",
    "io.kubernetes.cri.sandbox-namespace": "default",
    "io.kubernetes.cri.sandbox-uid": "0b49345f-6c1d-43fe-be0d-c8f5f6b5b64a"
  }
}

I was surprised to see that unikernelType is linux, then I also saw that cmdline is using urunit, so I looked that up and I found some info. If I understand correctly, these images are built like an actual Linux VM, with a real kernel, unlike Unikraft.

But what is important is that when com.urunc.unikernel.unikernelType is linux, urunc will enter this block of code:
https://github.com/urunc-dev/urunc/pull/194/files#diff-466e4ae9db540414965a782374a5200437b4f4fdf75c42056168d1cebfc3a2b1R319

Then, at line 324, the rootFsDevice.FsType will return the type of driver used by the main partition, for me its is ext2. Further down, at line 335, because unikernelParams.RootFSType is note empty, it will not check if 9pfs is supported.

For unikraft unikernel type, supportsBlock returns false. and the behavior is different. I added some traces with fmt.PrintLn and I could see that for the linux unikernel the RootFSType is calculated as Block, while for my Unikraft image it is '9pfs`. In both instances I'm setting com.urunc.unikernel.mountRootfs to "true".

But my problem is not how linux Unikernel type is handled, instead it is how unikraft unikernel type is handled. For my own built unikernel, something probably fails when preparing the rootfs. I will continue tomorrow with the instructions provided to build a Unikraft unikernel, and do a bit more digging.

But I'm wondering if it is even possible to test mountRootFs feature using images with unikernelType set to linux.

Thank you,
/Ciprian

@cmainas
Copy link
Contributor

cmainas commented Jul 23, 2025

Hello @ciprian-barbu ,

you are correct. With the new release, urunc is able to spawn "normal" containers using a minimal Linux kernel which is part of the container image. Your dive in the urunc code was also totally correct. You can find information on how to use 9pfs in Linux too in my previous comment #169 (comment).

TL;DR: You simply need to use a non-block based snapshotter (e.g. overlayfs). In unikraft, urunc will try to mount the rootfs through 9pfs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants