/ DOCKER , DOTNET

Optimize Docker .NET Core project and docker-compose solution to reduce build image time

Docker has built in great cache mechanism, but to be able to use it, you have to understand how it works. Let’s dive into it, to build .NET Core Docker images faster.

How Docker cache works

First we should understand how Docker cache works. This part is described nice in official documentation:

  • Starting with a base image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.
  • In most cases simply comparing the instruction in the Dockerfile with one of the child images is sufficient. However, certain instructions require a little more examination and explanation.
  • For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.
  • Aside from the ADD and COPY commands, cache checking will not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container will not be examined to determine if a cache hit exists. In that case just the command string itself will be used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands will generate new images and the cache will not be used.

So it always checks if given instruction is started in exactly the same situation as in the past. If it is, it doesn’t start it, but this layer is taken from cache. If there is even one bit changed, for example one copied file differs, then cache is invalidated and instruction is invoked truly.

How to decrease image build time

To build images faster, put everything which will rarely change, above things which change often. For example, when you compile your application inside Dockerfile, then your source files often change, which invalidates cache all the way down. But you probably download some dependencies, for example Nuget packages, which change less often than source code, so you can put it before compilation. This way as long as you stick to existing dependencies, they are taken from cache. When you add some new dependency or change version of existing one, cache is invalidated and packages are downloaded from the Internet (and new cache is created).

Ignore what you don’t need

When you copy files from host to build context, Docker calculates checksum for them. If exactly the same files were copied to exactly the same directory in Docker image, then this layer (as every operation in Dockerfile creates new layer) is taken from cache. So copy only those files which you need, because you can invalidate cache in vain, with files which are not required in your image, for example every Dockerfile, or files generated by your IDE and so on. Remember, that .dockerignore file has to be located in build context root. Example rules:

*/bin
*/obj
.dockerignore // Docker files
.env
.git // Git files
.gitignore // Git files
.gitattributes // Git files
.vs // IDE files
.vscode // IDE files
**/.toolstarget
.idea // IDE files
**/Samples // Additional files
build
Docker // Docker scripts and docker-compose files (in this example in 'Docker' directory, in context root level)
**/Dockerfile // Docker files
**/docker-compose // Docker files
*Test*

Try to create your own cache (yarn build example)

When you know where is your bottleneck and there is no obvious way to reduce it, try to figure out some none obvious solution. For example a few projects in one of products which I maintain, have yarn install and yarn build sewn into csproj file, like this:

<Target Name="PublishRunWebpack" AfterTargets="ComputeFilesToPublish" >
  <Exec WorkingDirectory="$(SpaRoot)" Command="yarn install" />
  <Exec WorkingDirectory="$(SpaRoot)" Command="yarn build" />

but this was extremely slow - it took (in Docker for Windows) around 4 minutes. And web files in those projects were changed rarely, usually C# code got changed, so I wanted to reuse yarn build results. I didn’t find a way to use npm/yarn cache in this situation, so I cached it myself. I moved copying web files to the top of Dockerfile, like this:

WORKDIR /app/WebClient/WebFiles
COPY WebClient/WebFiles . # Taken form cache as long as you don't change files in WebFiles directory
RUN yarn install # Taken form cache as long as you don't change files in WebFiles directory
RUN yarn build # Taken form cache as long as you don't change files in WebFiles directory

and changed csproj like this:

<Target Name="PublishRunWebpack" AfterTargets="ComputeFilesToPublish" Condition=" '$(Configuration)' == 'Release' And !Exists('$(ProjectDir)WebFiles/build') ">
  <Exec WorkingDirectory="$(SpaRoot)" Command="yarn install" />
  <Exec WorkingDirectory="$(SpaRoot)" Command="yarn build" />

So Docker cached yarn build output and it was invalidated only when web files changed. With target condition, C# project reused already existing yarn build output. Of course, because COPY WebClient/WebFiles was on top of my Dockerfile, every time when some web files changed, it invalidated cache all the way down, so if it was changed often, I would do it some different way, but in my case it worked as expected. And I know that in theory incorrect build directory could exist somehow, but it’s only theory, because in Docker situation is repeatable and on developer machines publish wasn’t invoked that way or another, only debug configuration, so in practice it’s not a risk. Maybe with incremental build I could be able to achieve the same with a bit cleaner way - if your case is similar to mine, you can try it.

Reuse what you can

If there is something which you can reuse, do it. For example when I have multiple Docker images in one solution, I build them all at once, instead of building every project in separation. I have observed, that this way it’s faster. For example, I have solution Application.sln which contains 5 executable projects (which are packed to separate Docker images) and additional projects like tests, usage samples and so on. So to lower number of projects to build, I have copied it to Application-minimal.sln and left only necessary projects. Then in every Dockerfile, instead of compiling only one, chosen project, like this: RUN dotnet publish PROJECT_NAME -c Release -o out I compile whole solution like this: RUN dotnet publish Application-minimal.sln -c Release. After compilation, I copy whole build output (with all projects) to final image (see Docker multi-stage build explained), for example this way: COPY --from=build-env /app/build .. Finally, at the very bottom of Dockerfile I invalidate cache specifying which project I want to run:

WORKDIR /app/ProjectA/publish # Pay attention to have correct path, especially if you set custom build output path.
ENTRYPOINT ["dotnet", "ProjectA.dll"]

Everything above is reused from cache, so when you want to build n projects, only first project takes a while to compile, other projects are taken from cache practically whole. In my case I’ve observed that it works faster than doing it in separation (running dotnet publish PROJECTX for every project). It will probably be faster in your case too, but as with every optimization, I encourage to measure it before applying :)

Experiment with caching nuget dependencies

In your developer PC, you have nuget cached locally. But in Docker you have clean situation every time you invoke build, so nuget packages are taken from the internet every time, so it’s good to cache them, because we don’t change project dependencies as often as project source files. When you compile only one project, then situation is clear, from project directory you can run:

COPY *.csproj ./ # Taken form cache as long as you don't change your project file
RUN dotnet restore # Taken form cache as long as you don't change your project file 

# Copy everything else and build
COPY . ./
RUN dotnet publish -c Release -o out

But when you want to restore files for whole solution, then it gets more complicated, because (at the time of writing) you cannot point to .sln file in dotnet restore command, and you cannot point to multiple csproj files neither. Fortunately, you can resolve it in many ways, for example with simple bash script:

#!/bin/bash
for projectName in *.csproj; do # It can be saved as one-linear straight in Dockerfile
	dotnet restore $projectName
done

used this way:

COPY restore-all.sh . # Taken form cache always
RUN chmod +x restore-all.sh
COPY */*.csproj ./ # Taken form cache as long as you don't change your csproj files
RUN ./restore-all.sh # Taken form cache as long as you don't change your csproj files

Alternatively: build image from dedicated container

Some people do not compile application in Dockerfile, but on their own PC and just put build output to image. This is faster, but it has one big downside: you’re not 100% sure that image is properly built, and generally speaking how it was made, so I’ll never wont to put such image on production. But I’ve also heard about another solution - to create container dedicated to create input to Docker image… Sounds a bit like overengineering, and probably in many cases it is, but maybe sometimes it’s worth doing it. You have more options in container, for example you can mount directories, so there is no problem with any cache you need. You can even use already existing cache from host machine. I’ve never tried such approach, above methods work well for me, but maybe sometimes it’s worth to organize CI/CD pipeline this way.

Summary

Proper caching in Dockerfile is quite broad subject, so here I only touched it in generals. The point of this article is not only to give you some ready to use tricks (like those with yarn or Nuget packages), but also to show you how cache in Docker works and encourage you to investigate your own situation. Pay attention to every build layer, measure which step takes too much time and where is your bottleneck. Sometimes with one line moved in Dockerfile, or one file added to .dockerignore you can make (really) big difference. And sometimes with small investigation you can resolve huge problem, for example .dockerignore file stored somewhere else than build context root directory.
If you have feeling that cache caused some problem (which in theory may be the case, but I never had such situation) you can always check how Docker builds with no-cache parameter, which forces to build everything from scratch.

I hope the article was helpful for you, please publish it on Facebook or Twitter and of course ask me anything, which is still not clear. If you know some other trick related to cache, please share it through comments, someone for sure will appreciate it :)

tometchy

Tometchy

Passionate focused on agile software development and decentralized systems

Read More