You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like it takes a while to mount volumes to the pods, impacting the spawn time significantly, I'm not sure what mounting process takes time yet though. There were many mounts happening.
A 10GB GCE PD through a PVC / PV.
A NFS server mount for the /home/curriculum folder that we did a gitpuller pull from to avoid relying on GitHub being up.
A set of k8s ConfigMaps were also mounted
If it's the mounting that takes time, how much time does it take? If mounting a NFS PVC is slow, but it's fast to mount a hostPath volume, one could mount the NFS storage on each node and then use a hostPath volume to access that mount indirectly. This is what @yuvipanda's https://github.com/yuvipanda/k8s-nfs-mounter is doing, but it's also something Yuvi is transitioning away from.
NFS read/write throughput and the rsync cache workaround
Google's managed NFS service called Filestore was not promising more than a sustained throughput of 100MB/sec, which is a bit low if we want users to have access to 1GB datasets and have hundreds of users. Due to this, I ended up running a DaemonSet to create a pod on each node where I used rsync to stash away a local replica. rsync was used instead of cp or similar in order to ensure we could stay up to date with changes.
While we didn't use NFS storage for the users, we could have, and then it would be relevant to try to solve the storage quota issue where you typically can't set quotas for individual users so easily.
@yuvipanda has demonstrated one solution using a self-hosted NFS server backed by storage on a XFS filesystem, and one can also use a Helm chart called nfs-provisioner to deploy a NFS server etc.
A challenge with a bootcamp like this is that we intent to tear it down after a while, but its not so great to delete access to storage for users. With that in mind, an option could be to archive it in some object storage and provide a way to access it later for users without having an NFS server running.
Access to the archived storage should not be public, so a simple solution would be to generate a password for each user which could be emailed or accessed through JupyterHub somehow which knows about the user. This could make sense to develop as a external JupyterHub service perhaps, which would be aware of the JupyterHub identity.
@yuvipanda is exploring this, but no GitHub repo is up yet to reference.
The text was updated successfully, but these errors were encountered:
Mounting of storage on user pods was slow
It seems like it takes a while to mount volumes to the pods, impacting the spawn time significantly, I'm not sure what mounting process takes time yet though. There were many mounts happening.
gitpuller
pull from to avoid relying on GitHub being up.If it's the mounting that takes time, how much time does it take? If mounting a NFS PVC is slow, but it's fast to mount a hostPath volume, one could mount the NFS storage on each node and then use a hostPath volume to access that mount indirectly. This is what @yuvipanda's https://github.com/yuvipanda/k8s-nfs-mounter is doing, but it's also something Yuvi is transitioning away from.
NFS read/write throughput and the
rsync
cache workaroundGoogle's managed NFS service called Filestore was not promising more than a sustained throughput of 100MB/sec, which is a bit low if we want users to have access to 1GB datasets and have hundreds of users. Due to this, I ended up running a DaemonSet to create a pod on each node where I used
rsync
to stash away a local replica.rsync
was used instead ofcp
or similar in order to ensure we could stay up to date with changes.Some related PRs for this were #60, #63, #66, #100.
NFS quotas
While we didn't use NFS storage for the users, we could have, and then it would be relevant to try to solve the storage quota issue where you typically can't set quotas for individual users so easily.
@yuvipanda has demonstrated one solution using a self-hosted NFS server backed by storage on a XFS filesystem, and one can also use a Helm chart called nfs-provisioner to deploy a NFS server etc.
pangeo-data/pangeo-cloud-federation#654
NFS archiving
A challenge with a bootcamp like this is that we intent to tear it down after a while, but its not so great to delete access to storage for users. With that in mind, an option could be to archive it in some object storage and provide a way to access it later for users without having an NFS server running.
Access to the archived storage should not be public, so a simple solution would be to generate a password for each user which could be emailed or accessed through JupyterHub somehow which knows about the user. This could make sense to develop as a external JupyterHub service perhaps, which would be aware of the JupyterHub identity.
@yuvipanda is exploring this, but no GitHub repo is up yet to reference.
The text was updated successfully, but these errors were encountered: