Hypothesis

23 Matching Annotations

Apr 2020
osf.io osf.io

ten-simple-rules-dockerfiles.pdf

22
1. davclark 17 Apr 2020
  
  in Public
  
  build an542image stack
  
  Or contribute to an existing one!
2. davclark 17 Apr 2020
  
  in Public
  
  Use a Dockerfile per project and publish it with a525version control system
  
  Given the overlap in concerns - maybe this should be re-organized to be near point (2) - using versioned Docker Images? Having them together would clarify how the two concerns are different, but overlapping
3. davclark 17 Apr 2020
  
  in Public
  
  Kitematic [
  
  AFAIK Kitematic is totally deprecated. I would recommend Portainer or (on Mac or Windows) the Docker dashboard.
4. davclark 17 Apr 2020
  
  in Public
  
  ENTRYPOINT ["python"]CMD ["/workspace/run-all.sh"
  
  This is a little obtuse. Wouldn't a python script as the CMD be more obvious / clear?
5. davclark 17 Apr 2020
  
  in Public
  
  In any case you should document different variants478very well, potentially capture build and run commands in aMakefile[26]
  
  This seems a little inconsistent with the argument to include commands in the Dockerfile itself above (where I commented that maybe you should use Makefiles ;)
6. davclark 17 Apr 2020
  
  in Public
  
  You should420avoid installing software packages from source afterCOPYing the code into the image,421because the connection between the file outside of the image and the one copied in is422easily lost (cf. Rule 7
  
  Often, though, you are developing the package while working with it. In this case, installing over a bind-mount (as you suggest) in developer mode (e.g., pip install -e dir) can be a good way to go. Such an install can be included in an entrypoint script, or just keep in mind that you can stop and not delete a container and keep things configured how you left them.
7. davclark 17 Apr 2020
  
  in Public
  
  bind405mounts
  
  There are performance considerations here as well - bind mounts save space, and are equal performance on Linux. You can unfortunately get very bad performance on Docker for Win / Mac, and you might want to use a volume mount containing your data or similar. WSL2 on Windows also promises improved performance.
8. davclark 17 Apr 2020
  
  in Public
  
  Conda
  
  And also Pip, ya? This feels like pushing people towards Pip for funny reasons (I tend to use pip because it is faster)
9. davclark 17 Apr 2020
  
  in Public
  
  RUN pip install geopy==1.20.0 &&npip install uszipcode==0.2.2
  
  Is there a reason you're recommending two separate pip invocations here? Since it's the same layer, they'll still both run if you change the command. If you give pip all requirements at once, there's less chance of version thrashing (potentially even installing incompatible versions of some things)
10. davclark 17 Apr 2020
  
  in Public
  
  You should regularly re-build the image using the--no-cacheoption
  
  And perhaps make sure to tag your good / working container before you do!
11. davclark 17 Apr 2020
  
  in Public
  
  Therefore you should311add instructionsin orderof least likely to change to most likely to change
  
  One complaint about Docker is that it is slow. If you tend to append while building your image, your iterations will be fast. You can re-organize the layers at the end.
  
  Your guidelines still seem good even when you're iterating! But I'd lean towards appending at the end until I figure things out.
12. davclark 17 Apr 2020
  
  in Public
  
  volume mounts, specific300names, or ports are important for using the container, see for example the final lines of301Listing 1
  
  Its also reasonable to include external commands that include this information - scripts, a Makefile, or a docker-compose.yaml, as examples (all of these allow the use of relative paths, which aren't allowed by the docker command directly).
13. davclark 17 Apr 2020
  
  in Public
  
  ARG
  
  I would introduce ARG before including it in code, or explain it immediately after. It's not part of the "core" that everyone familiar with Docker will know
14. davclark 17 Apr 2020
  
  in Public
  
  custom279metadatato images
  
  Similarly, ENV can provide metadata to programs running inside the container (I don't think you can access LABELs from inside?)
15. davclark 17 Apr 2020
  
  in Public
  
  one scoped251action
  
  one scoped, documented action?
16. davclark 17 Apr 2020
  
  in Public
  
  keep the script in the container for a future231user to inspect
  
  ...and a script is really small so it's not a big deal for size concerns!
  
  You might also note that if you use Docker's COPY command, you can never get rid of the data even if you delete it - it'll hang around in the COPY layer.
17. davclark 17 Apr 2020
  
  in Public
  
  especially when221connecting multiple commands in aRUNinstruction with&&
  
  Inspired by the standard syle-guide in Elm, I tend to put connecting syntactic elements at the beginning, e.g.:
  
  RUN some-command \ && another-command
  
  This can dramatically reduce the chance of accidentally removing a needed && or leaving one lingering around...
18. davclark 17 Apr 2020
  
  in Public
  
  Do not201docker pusha locally built image,
  
  I think that doing a docker push on a locally built image is generally fine... and probably better to do so for archival (with caveats) vs. not doing it?
  
  There are other ways to address security concerns - e.g., running the container in a cloud docker service? (and I'm not shilling for Gigantum here - we don't quite support this very well unless you create you own base, which is more complex than just publishing to a registry)
19. davclark 17 Apr 2020
  
  in Public
  
  for images that you build yourself and then run
  
  This seems worth expanding - how do you do this? (I know you specify it in the docker build command, but you could give an example, just as you include a versioned FROM later on)
20. davclark 17 Apr 2020
  
  in Public
  
  only use183images where you have access to theDockerfile
  
  How do you verify the Dockerfile was actually used to build the image?
21. davclark 17 Apr 2020
  
  in Public
  
  optimised for high100performance computing
  
  I would say that it's optimized for the security needs of traditional HPC environments. People use Docker (esp. Kubernetes) in novel HPC contexts, and there is even national infrastructure that supports Docker!
22. davclark 17 Apr 2020
  
  in Public
  
  Research Software Engineers (RSEs) are not the59target audience for this work, but we want to encourage you to reach out to your local60or national RSE community if your needs go beyond the rules of this work.
  
  Not sure why you'd want to suggest RSE's not read the paper? Even if it's head-nodding in total agreement, presumably, they might use this as a resource at least?
Visit annotations in context

Annotators

davclark

URL

osf.io/fsd7t
Mar 2018
paperpile.com paperpile.com

Stephens et al. 2015 - PLoS Biol.

1
1. davclark 26 Mar 2018
  
  in Public
  
  Paperpile + hypothesis seems rad
Visit annotations in context

Annotators

davclark

URL

paperpile.com/view/2f25e3f7-6734-0407-a849-154cebda9c28

Annotators

URL

Annotators

URL