First attempt at Dockerized production socialhome
Created by: Vir-Cotto
This is a small pull request, but there's a lot in this which we'll need to figure out.
The first thing to note is that we use a single image for all of the various Socialhome functions. That's not a normal Docker pattern, but due to the structure of Socialhome, the alternative would be to have a set of Docker images that are exactly the same except for the entrypoint. More on this a bit later.
Secondly, Docker Compose does not allow for multiple instances of a particular program. That is, it's not possible using Docker Compose to run multiple instances of RQScheduler. That leaves us with two alternatives: of a) Requiring Docker Swarm or b) Using something like Circus to manage these processes
While that's not supported in the current setup, my suggestion would be that for small instances, mirroring the Circus setup might be the easiest thing to do. Larger installations may want to use Docker Swarm or Kubernetes, but I'm hesitant to require it.
Thirdly, I've been giving a lot of thought and consideration to the users that these applications run as. Docker's use of Linux containers means that there should be no easy escalation path from root within a container to root outside of a container- that's the whole point of containers. That notwithstanding, I still think that privilege de-escalation is a good idea, the problem has to do with file permissions in the data volumes- that is between the mapping between the container's view of the world (its /etc/passwd) and the system's users, specifically mapping of UIDs and GIDs between the two. It would be fine (and easy) to run de-escalated within the container, but there can be problems with file permissions either between Docker images that don't share the same user, or between the Docker image and if (for some reason) that same area was shared with another system. The way this applies to Socialhome is in mainly regards to the media directory. I think the long term solution (before making this official) is to simply accept this limitation, but right now, everything runs as root.
Fourthly, I'm suggesting that a developer using this setup use nginx on an unencrypted port 80. The reason for this is that it likely makes sense that someone deploying this software will be using some kind of virtual host management and likely adding SSL/TLS encryption at that level. Therefore by offering the site unencrypted to that service, it can handle all the certificates itself, and or do caching.
Fifth, relatedly, I'm using the nginx official Docker image with a configuration file I supply myself. This involves an extra step for the user (deployer) in setting this up. We could offer our own image.
Sixth, I'm including an exim server for smart relay. I'm very much on the fence about this. On the argument for including it, this simplifies the Socialhome setup in the sense that the Socialhome software itself would only need to know about the "smtp" server, rather than needing to have the software talk to the actual mail server. This also eliminates the need for authentication credentials to the mail server to be shared to the various Python components, and also means that they don't need to even have network access to the final mail server- only the "smtp" server would need that. Lastly, it means that if the upsteam mail server is not up for some reason, it should allow for the mail sending process (one of the rqworker processes) to complete successfully.
On the 'con' side, there are a number of negatives. Firstly, this will add an extra level of complexity, and delay, to mail sending. Secondly, there's an argument that could be made that the notification system itself should handle the inability to send mail, rather than it completing successfully and then being unable to know if there was a failure later. Thirdly, I'm not entirely thrilled with this Docker image. It's unclear to me where Exim stores its mail queue data and without that, it would be possible that mail might be lost if the server is shut down. There are equivalent images for Postfix (which I'm more familiar with) but they seemed more complex to set up.
I almost went down of writing my own LSMTP server, and then decided that this was a very bad idea.
I don't consider this done and ready. I am giving this to you as something to discuss.
At the very least, before it's committed, it needs more testing and documentation.
Seventh and last, the production image should only contain the software needed to run, not all the software needed to build Socialhome. That means all the NPM build tools are unnecessary. There are a number of ways to do this, but I think the right way is simply to use a multi-stage build (the same way we might if we were compiling software). If we do this, we may also be able to take advantage of the opportunity to run
compileall on the code, building .pyo and .pyc files to improve loading time.
To highlight the most pressing questions:
Do we have one image with lots of entrypoints, or many images?
Handling the number of rqworkers required without adding more dependedencies
How do we want to handle users?
Do we want to ship instructions, or our own nginx image?
Do we want to suggest this exim container or maybe something else?
And remaining to do:
- Testing this setup - I need to test it a few more times from scratch
- Documentation - There's a lot to document, especially the boostrapping process
- Deciding on some of the questions above
- Putting up an official docker image for Socialhome