Seed Your MongoDB Container with JSON data using Docker and Docker-Compose

09-11-2021 842 words 4 minutes

Contents

Requirements

A couple of years ago I made a frontend app which rendered JSON data stored in-memory. The App provided a check-list for Quality Management called QualiExplore (under Apache 2.0 License).

Nothing too fancy, but we needed to make the app a bit more flexible and decided to use MongoDB to render the information as well as make it easy to introduce new information to the JSON data that previously persisted in the App.

Fundamentally, the app should be out-of-the-box usable for someone who wants to use it (with other additions currently on-going e.g., GraphQL backend + User authentication / authorization)

Problem

The JSON data, that previously persisted in the app needs to be seeded into the MongoDB instance once the complete stack is brought up.

Doesn’t sound like a bit deal, should be possible to use volume mounts with docker, right? Yup, you guessed it but MongoDB doesn’t just ingest data because one mounted a volume to a specific directory. To the rescue comes mongoimport tool where one can load data via CLI. Seems like our problem is solved!

We it turns out I wanted to seed data into the MongoDB in a litte more secure manner aka. maybe use a username and password and a dedicated initialized database where this imported data should persist under specific collections.

Turns out, once you figure out the mongoimport tool’s Syntax, the next big headache was to figure out how to securely pass this information to a docker image which will insert this information into Mongo.

I generally have a stack design pattern where I try to keep changes into YAML files for an end-user minimum and let the changes reflect through the environment variables passed to the stack. So passing the environment variables to the respective docker containers shouldn’t be a big hassle, right? Oh! how wrong I was!

Approach

I decided to following the following sequential steps:

Bring the MongoDB container up
Create a docker container that is connected to the same network as MongoDB container as above that has mongoimport on it
Mount the JSON data into the container in step 2 and insert the data
In order to make the ingestion of data bit more secure pass the information like the database’s URI, credentials via environment variables

So the initial project setup is as follows:

.
├── docker-compose.yml
├── mongo_seed
│   ├── Dockerfile
│   ├── factors.json
│   └── filters.json
└── .env

Here, the mongo_seed is the directory with the data to be ingested as well as the Dockerfile that will perform the necessary ingestion.

The root of the directory contains the docker-compose.yml for the stack and its respective .env file for environment variables

Problems Faced

This requires a quick primer of Docker’s ARG and ENV in context of docker. A decent read-through I used can be found here

In a nutshell

ARG is used during build-time i.e., when building a docker container. It’s lifetime is short till the build is complete. Once the container is built, they are not available for further usage. ENV, on the other hand, is available for the run-time of the container too.

A common practice that a lot of container developers use is to declare an ARG and then use the value of the incoming build-argument as an ENV for further usage.

so something like

ARG database-uri

ENV DATABASE_URI=$(database-uri)

# Use `$DATABASE_URI` to insert database

This wasn’t the case since I ended facing problem which I documented on StackExchange

The problem concisely, is the environment variables from .env fille passed into the mongo-seed container doesn’t get set during run-time

Solution

After going through some StackExchange Queries, an answer from Zincfan on a Query regarding getting Env Variable in Dockerfile turned out to be the solution

Steps

Declare Same ARG and ENV of the same name as that of the environment variables
Get the information from the build-argument and store the information into the env. variable using the syntax: ARG DATABASE_URI ENV DATABASE_URI ${DATABASE_URI}
use ${DATABASE_URI} in the run-time of the container

to pass the information from the .env passed to the compose file should be in the following way:

 services:
     mongo-seed:
         args:
              - DATABASE_URI=$DATABASE_URI

Files

So finally my Dockerfile looks like the following:

FROM mongo:5.0
 # Will be set through Environment Files
 ARG DATABASE_URI
 ARG USERNAME
 ARG PASSWORD

 ENV DATABASE_URI ${DATABASE_URI}
 ENV USERNAME ${USERNAME}
 ENV PASSWORD ${PASSWORD}

 COPY factors.json /factors.json

 COPY filters.json /filters.json

 CMD mongoimport --username ${USERNAME} --password ${PASSWORD} --uri ${DATABASE_URI} --collection factors --drop --file /factors.json && \
     mongoimport --username ${USERNAME} --password ${PASSWORD} --uri ${DATABASE_URI} --collection filters --drop --file /filters.json

Good to Note: Never split into two CMD in the file, rather merge multiple commands into one CMD.

my docker-compose.yml file looks like the following:

services:
    # MongoDB
    mongo:
        container_name: mongodb
        image: mongo:latest
        env_file:
            - .env
        ports:
            - "27017:27017"
        networks:
            - "qualiexplore_net"

    # Initial Seed to QualiExplore Database
    mongo-seed:
        env_file:
            - .env
        build:
            context: ./mongo_seed
            dockerfile: Dockerfile
            args:
                - DATABASE_URI=$DATABASE_URI
                - USERNAME=$MONGO_INITDB_ROOT_USERNAME
                - PASSWORD=$MONGO_INITDB_ROOT_PASSWORD
        depends_on:
            - mongo
        networks:
            - "qualiexplore_net"

My .env file is as follows:

MONGO_INITDB_ROOT_USERNAME=root
MONGO_INITDB_ROOT_PASSWORD=example
MONGO_INITDB_DATABASE=qualiexplore
DATABASE_URI=mongodb://mongodb:27017/qualiexplore?authSource=admin****

Comments, Suggestions

If you would like to suggest me some changes in my design or a better way to tackle this situation, I am all ears and would love to improve and discuss aspects. You can contact me on LinkedIn or via E-Mail.