Backing up a Dockerized Community Edition Neo4j Database
Neo4j is a platform for managing graph databases. The Montana Climate Office uses Neo4j Community Edition to manage a database of satellite data associated with each of our Mesonet stations (learn more on our GitHub). The database is queried by the Montana Mesonet Dashboard to plot and contextualize current conditions at each Mesonet station. The database is extremely performant and can query over 20 years of data at a given station in a fraction of a second! The database is stood up in a Docker container for ease of mobility and reproducibility.
The Problem
While we were developing the database, we ran into a roadblock when implementing a method to operationally backup the data. There are numerous tutorials online explaining how to backup a Neo4j database, but they all rely on the paid Enterprise Edition of Neo4j. The Enterprise Edition simply allows you to run the neo4j-admin backup
command on an active database to dump the data into an archived file. In the Community Edition however, you have to use the neo4j-admin dump
command on a stopped database to backup the data. While this seems simple enough, problems arise when using this method within a Docker container. To run the neo4j-admin dump
command, you first have to run neo4j stop
to stop all the databases. Because the default Neo4j Docker image depends on Neo4j to be up and running, this causes the Docker container to shut down, making it impossible to run the neo4j-admin dump
command.
Our Solution
To work around this problem, we extended the Neo4j Docker image so the container wouldn’t stop when Neo4j was stopped. Then, we used cron
to run a backup script every night. Here is what the new Dockerfile
looks like:
FROM neo4j:4.4.8-community
RUN apt-get update && \
apt-get -y install cron
# Copy the backup script into the container.
COPY ./backup.sh /opt/backup.sh
# Make a directory to store the backed up data in.
RUN mkdir -p /dumps/neo4j
# Give execution rights on the cron job
RUN chmod u+x /opt/backup.sh
# Copy environment and shell into the crontab
RUN env >> ~/env.log
RUN echo 'PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' > /etc/crontab
RUN echo 'SHELL=/bin/bash' >> /etc/crontab
# Make our backup script run at 6am UTC every day.
RUN echo '0 6 * * * root bash -c "source $HOME/env.log; source /opt/backup.sh"' >> /etc/crontab
# Create the log file to be able to run tail.
RUN touch /var/log/cron.log
# Run the tail on the cron log so the container won't stop when Neo4j is stopped.
CMD tail -f /var/log/cron.log
The backup script that is copied into the Docker container simply stops Neo4j, dumps the latest snapshot of the database into the /dumps/neo4j
folder, then restarts Neo4j:
#!/bin/bash
/var/lib/neo4j/bin/neo4j stop
/var/lib/neo4j/bin/neo4j-admin dump --database=neo4j --to=/dumps/neo4j/backup_$(date +%Y%m%d%H%M).dump
/var/lib/neo4j/bin/neo4j start
We run the Neo4j container with another Docker container that updates the database with new satellite data, so we use docker compose
to stand up the database. Here is what the relevant portions of our docker-compose.yml
look like:
version: '3.9'
services:
neo4j:
container_name: neo4j
restart: on-failure
build: ./neo4j
volumes:
# Map volumes to machine so data is persisted between containers.
- /neo4j/data:/data
- /neo4j/logs:/logs
- /neo4j/import:/var/lib/neo4j/import
- /neo4j/plugins:/plugins
# The db-backups volume points to an external NAS server to
# store the data on.
- db-backups:/dumps
ports:
- "7474:7474"
- "7687:7687"
env_file:
- .env
volumes:
db-backups:
driver: local
driver_opts:
type: nfs
o: nfsvers=4,addr=${nfsIP},nolock,soft,rw
device: $nfsPath
Unfortunately, we are so far unable to find a method for starting the cron
service that operationally runs the backups within the Dockerfile
without breaking Neo4j. To address this, we use a deploy script that starts the container and then starts cron
after the container is stood up:
#!/bin/bash
# Get latest changes
git pull origin main
# Start docker containers
docker compose up --build -d
# Start cron
docker exec -it neo4j service cron start
# Make sure Neo4j is running
docker exec -it neo4j neo4j start
Restoring the Database
Now, if the data get deleted, your server crashes, or anything else happens to your data, you have a backup that can be used to restore the database. To restore the database, you just have to run the neo4j-admin load
command before deploying the database with docker compose:
docker run --interactive --tty --rm \
--volume=path/to/data/in/docker-compose:/data \
--volume=path/to/dumps/in/docker-compose:/dumps \
neo4j:4.4.8 \
neo4j-admin load --database=neo4j --from=/dumps/<name_of_latest_backup>.dump
Running the above command reads the latest .dump
file and puts the data into the /data
folder in a format that is readable by Neo4j. After running this command, running the above deploy script will stand up the database with data from the latest backup!