Cdi Datavolumes
CDI DataVolumes
Containerized Data Importer (or CDI for short), is a data import service for Kubernetes designed with KubeVirt in mind. Thanks to CDI, we can now enjoy the addition of DataVolumes, which greatly improve the workflow of managing KubeVirt and its storage.
What it does
DataVolumes are an abstraction of the Kubernetes resource, PVC (Persistent Volume Claim) and it also leverages other CDI features to ease the process of importing data into a Kubernetes cluster.
DataVolumes can be defined by themselves or embedded within a VirtualMachine resource definition, the first method can be used to orchestrate events based on the DataVolume status phases while the second eases the process of providing storage for a VM.
How does it work?
In this blog post, I’d like to focus on the second method, embedding the information within a VirtualMachine definition, which might seem like the most immediate benefit of this feature. Let’s get started!
Environment description
-
OpenShift
For testing DataVolumes, I’ve spawned a new OpenShift cluster, using dynamic provisioning for storage running OpenShift Cloud Storage (GlusterFS), so the Persistent Volumes (PVs for short) are created on-demand. Other than that, it’s a regular OpenShift cluster, running with a single master (also used for infrastructure components) and two compute nodes.
-
CDI
We also need CDI, of course, CDI can be deployed either together with KubeVirt or independently, the instructions can be found in the project’s GitHub repo.
-
KubeVirt
Last but not least, we’ll need KubeVirt to run the VMs that will make use of the DataVolumes.
Enabling DataVolumes feature
As of this writing, DataVolumes have to be enabled through a feature gate, for KubeVirt, this is achieved by creating the kubevirt-config ConfigMap on the namespace where KubeVirt has been deployed, by default kube-system.
Let’s create the ConfigMap with the following definition:
---
apiVersion: v1
data:
feature-gates: DataVolumes
kind: ConfigMap
metadata:
name: kubevirt-config
namespace: kube-system
$ oc create -f kubevirt-config-cm.yml
Alternatively, the following one-liner can also be used to achieve the same result:
$ oc create configmap kubevirt-config --from-literal feature-gates=DataVolumes -n kube-system
If the ConfigMap was already present on the system, just use oc edit
to add the DataVolumes feature gate under the data field like the YAML above.
If everything went as expected, we should see the following log lines on the virt-controller pods:
level=info timestamp=2018-10-09T08:16:53.602400Z pos=application.go:173 component=virt-controller msg="DataVolume integration enabled"
NOTE: It’s worth noting the values in the ConfigMap are not dynamic, in the sense that virt-controller and virt-api will need to be restarted, scaling their deployments down and back up again, just remember to scale it up to the same number of replicas they previously had.
Creating a VirtualMachine embedding a DataVolume
Now that the cluster is ready to use the feature, let’s have a look at our VirtualMachine definition, which includes a DataVolume.
apiVersion: kubevirt.io/v1alpha2
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: testvm1
name: testvm1
spec:
dataVolumeTemplates:
- metadata:
name: centos7-dv
spec:
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
source:
http:
url: "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"
running: true
template:
metadata:
labels:
kubevirt.io/vm: testvm1
spec:
domain:
cpu:
cores: 1
devices:
disks:
- volumeName: test-datavolume
name: disk0
disk:
bus: virtio
- name: cloudinitdisk
volumeName: cloudinitvolume
cdrom:
bus: virtio
resources:
requests:
memory: 8Gi
volumes:
- dataVolume:
name: centos7-dv
name: test-datavolume
- cloudInitNoCloud:
userData: |
#cloud-config
hostname: testvm1
users:
- name: kubevirt
gecos: KubeVirt Project
sudo: ALL=(ALL) NOPASSWD:ALL
passwd: $6$JXbc3063IJir.e5h$ypMlYScNMlUtvQ8Il1ldZi/mat7wXTiRioGx6TQmJjTVMandKqr.jJfe99.QckyfH/JJ.OdvLb5/OrCa8ftLr.
shell: /bin/bash
home: /home/kubevirt
lock_passwd: false
name: cloudinitvolume
The new addition to a regular VirtualMachine definition is the dataVolumeTemplates block, which will trigger the import of the CentOS-7 cloud image defined on the url field, storing it on a PV, the resulting DataVolume will be named centos7-dv, being referenced on the volumes section, it will serve as the boot disk (disk0) for our VirtualMachine.
Going ahead and applying the above manifest to our cluster results in the following set of events:
- The DataVolume is created, triggering the creation of a PVC and therefore, using the dynamic provisioning configured on the cluster, a PV is provisioned to satisfy the needs of the PVC.
- An importer pod is started, this pod is the one actually downloading the image defined in the url field and storing it on the provisioned PV.
- Once the image has been downloaded and stored, the DataVolume status changes to Succeeded, from that point the virt launcher controller will go ahead and schedule the VirtualMachine.
Taking a look to the resources created after applying the VirtualMachine manifest, we can see the following:
$ oc get pods
NAME READY STATUS RESTARTS AGE
importer-centos7-dv-t9zx2 0/1 Completed 0 11m
virt-launcher-testvm1-cpt8n 1/1 Running 0 8m
Let’s look at the importer pod logs to understand what it did:
$ oc logs importer-centos7-dv-t9zx2
I1009 12:37:45.384032 1 importer.go:32] Starting importer
I1009 12:37:45.393461 1 importer.go:37] begin import process
I1009 12:37:45.393519 1 dataStream.go:235] copying "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" to "/data/disk.img"...
I1009 12:37:45.393569 1 dataStream.go:112] IMPORTER_ACCESS_KEY_ID and/or IMPORTER_SECRET_KEY are empty
I1009 12:37:45.393606 1 dataStream.go:298] create the initial Reader based on the endpoint's "https" scheme
I1009 12:37:45.393665 1 dataStream.go:208] Attempting to get object "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" via http client
I1009 12:37:45.762330 1 dataStream.go:314] constructReaders: checking compression and archive formats: /centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
I1009 12:37:45.841564 1 dataStream.go:323] found header of type "qcow2"
I1009 12:37:45.841618 1 dataStream.go:338] constructReaders: no headers found for file "/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"
I1009 12:37:45.841635 1 dataStream.go:340] done processing "/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2" headers
I1009 12:37:45.841650 1 dataStream.go:138] NewDataStream: endpoint "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"'s computed byte size: 8589934592
I1009 12:37:45.841698 1 dataStream.go:566] Validating qcow2 file
I1009 12:37:46.848736 1 dataStream.go:572] Doing streaming qcow2 to raw conversion
I1009 12:40:07.546308 1 importer.go:43] import complete
So, following the events we see, it fetched the image from the defined url, validated its format and converted it to raw for being used by qemu.
$ oc describe dv centos7-dv
Name: centos7-dv
Namespace: test-dv
Labels: kubevirt.io/created-by=1916da5f-cbc0-11e8-b467-c81f666533c3
Annotations: kubevirt.io/owned-by=virt-controller
API Version: cdi.kubevirt.io/v1alpha1
Kind: DataVolume
Metadata:
Creation Timestamp: 2018-10-09T12:37:34Z
Generation: 1
Owner References:
API Version: kubevirt.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: VirtualMachine
Name: testvm1
UID: 1916da5f-cbc0-11e8-b467-c81f666533c3
Resource Version: 2474310
Self Link: /apis/cdi.kubevirt.io/v1alpha1/namespaces/test-dv/datavolumes/centos7-dv
UID: 19186b29-cbc0-11e8-b467-c81f666533c3
Spec:
Pvc:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 10Gi
Source:
Http:
URL: https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
Status:
Phase: Succeeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 29s (x13 over 14m) datavolume-controller DataVolume synced successfully
Normal Synced 18s datavolume-controller DataVolume synced successfully
The DataVolume description matches what was defined under dataVolumeTemplates. Now, as we know it uses a PV/PVC underneath, let’s have a look:
$ oc describe pvc centos7-dv
Name: centos7-dv
Namespace: test-dv
StorageClass: glusterfs-storage
Status: Bound
Volume: pvc-191d27c6-cbc0-11e8-b467-c81f666533c3
Labels: app=containerized-data-importer
cdi-controller=centos7-dv
Annotations: cdi.kubevirt.io/storage.import.endpoint=https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
cdi.kubevirt.io/storage.import.importPodName=importer-centos7-dv-t9zx2
cdi.kubevirt.io/storage.pod.phase=Succeeded
pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/glusterfs
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ProvisioningSucceeded 18m persistentvolume-controller Successfully provisioned volume pvc-191d27c6-cbc0-11e8-b467-c81f666533c3 using kubernetes.io/glusterfs
It’s important to pay attention to the annotations, these are monitored/set by CDI. CDI triggers an import when it detects the cdi.kubevirt.io/storage.import.endpoint, assigns a pod as the import task owner and updates the pod phase annotation.
At this point, everything is in place, the DataVolume has its underlying components, the image has been imported so now the VirtualMachine can start the VirtualMachineInstance based on its definition and using the CentOS7 image as boot disk, as users we can connect to its console as usual, for instance running the following command:
$ virtctl console testvm1
Cleaning it up
Once we’re happy with the results, it’s time to clean up all these tests. The task is easy:
$ oc delete vm testvm1
Once the VM (and its associated VMI) are gone, all the underlying storage resources are removed, there is no trace of the PVC, PV or DataVolume.
$ oc get dv centos7-dv
$ oc get pvc centos7-dv
$ oc get pv pvc-191d27c6-cbc0-11e8-b467-c81f666533c3
All three commands returned No resources found.