Using Rstudio server with charliecloud
TLDR;
We can run Rstudio server within charliecloud by performing ‘fake authentication’
of the Rstudio user using a custom authentication script and by running the rserver
binary instead of the usual Rstudio service. I put together the needed scripts and an example dockerfile
in this repository.
Introduction
This post is about how one can use charliecloud on a high-performance computing (HPC) environment running the SLURM workload manager to run an Rstudio server.
Charliecloud is a lightweight containerization solution (such as docker or singularity which you might have heard of) and as such can be used to run your own user-defined software stack (UDSS, checkout also this technical paper about charliecloud). This essentially means that you can define your very own software environment (including any programs/tools/libraries of your choosing), which you can distribute to any system running charliecloud, making it thus possible to have a fully reproducible and isolated system available anywhere (i.e. you will always have identical software versions wherever you run your container).
Rstudio and R is one such software which you might want to make fully reproducible, including any CRAN or bioconductor packages, etc. With Rstudio server, it is possible to create R sessions on a remote computer and access these sessions via your browser through a web interface. R itself, as your probably know, is a script language mainly used for statistical analysis and which makes it easy to for example handle datasets (i.e. cleaning, summarizing and visualizing data).
SLURM is a widely used workload manager on HPC systems which enables users to send jobs (e.g. specific analysis tasks) to a central computing cluster which will then be queued and subsequently executed, thereby leveraging large compute resources.
So, given the information above, one might want to combine the three concepts: Run R sessions accessible via a browser (hence Rstudio server) on a HPC system to leverage resources, while having specific and well-defined R and package versions available which can easily be transferred to any system running charliecloud for reproducibility.
The problem
Charliecloud runs unprivileged images, that means that software such as Rstudio server which usually need privileged access to a system can not be run without problems. In this post I’ll describe a way to circumvent this issue mainly by
- not executing the Rstudio-daemon but the
rserver
binary and - providing a custom light authentication script
The main issue is really that Rstudio tries to perform PAM authentication, which is not possible due to it being run in unprivileged mode. Hence, we generate a custom authentication procedure.
The solution
We will not handle container creation for an Rstudio server in detail at this point, you just need to have docker installed and create and export a charliecloud image using e.g. the rocker/verse definitions from Docker Hub.
The most crucial point is to fake authentication for the Rstudio server (which would usually need privileged access) for your R session. The solution is to create a random password which is shown to the executing user at server start, but will still use the user’s system user name.
First, put this into a script (e.g. r-auth.sh
) and copy the script to the image, e.g. under /bin/r-auth.sh
. This will handle the authentication step, checking the username and the password set in the RSTUDIO_PASSWORD
environment variable (set in the next script):
Once this is done, it is relatively simple to start the server.
You can use the following script to do so (you might want to put it also into the charliecloud image for convenience, e.g. under /bin/start-rstudio.sh
).
Essentially, we just execute the rserver
binary, specifying the ‘fake authentication’ script as an authentication helper and preparing the password beforehand:
The lines above will run the server and show the randomly generated password to the user.
If you now access the server via a web browser (type the IP-address or name of the machine you run the server followed by the port specification in the address field, e.g. http://server-name:8188
if you specified port-number 8188
) you see the Rstudio server login page. Type your usual user-name and provide the generated password shown in the terminal to login to your very own and secured R session!
NOTE: You might want to pass the port number as a parameter to your script
NOTE 2: You also might want to consider to create your own secure-cookie-key file if you run it on a multi-user system (which is likely the case!). You can use the
--secure-cookie-key-file
parameter of therserver
bin to provide your own file.
Summary
That’s all! Please let me know if I missed some details you’d like to know. However, the main steps should be straight forward and can be adjusted to your needs as required. I hope the post helped you along with getting your Rstudio server to run within charliecloud.
Until then, farewell!
Comments