The problem
Automated building and testing is essential in the modern software environment and that can mean you are running your CI/CD pipelines hundreds of times per day. To do this at scale you'll need to make use of containerised reference databases and you'll need to go about it in an organised way.
You'll want to minimise the creation time of these databases and make sure you delete them in a timely manner to avoid accumulating the potentially considerable costs of forgotten, but still active resources.
You’ll want to make those databases task specific. Attempting to use databases for multiple purposes is a recipe for collision, chaos and confusion.
Finally, you might like to base those reference databases on your real data, but with PII and other sensitive data replaced, keeping them up to date and in sync with your codebase. You've probably realised that you'd rather not create such a framework yourself. So what are your choices?
The solution
Tonic Ephemeral a.k.a. 'Databases on Demand' is just what you've been looking for. Ephemeral allows you to create an encyclopaedic library of database snapshots so you can precisely create a containerised database server in seconds. The database server can be terminated when no longer required and configured to auto terminate to avoid unnecessary costs. Because Ephemeral databases can be created quickly and dynamically they can be made task specific avoiding the friction of multi-use contention. You can make use of our online service or self host, depending on your security preferences. The Ephemeral API allows you to quickly and easily integrate use of Ephemeral into your development pipelines. Ephemeral is tightly integrated with Tonic Structural and can therefore be used to generate sanitised, appropriately sized (via subsetting) database images precisely tuned to the branches and needs of your codebase.
This article will show you how it's done so you can start working with Ephemeral today. You should read this in conjunction with our GitHub repository, EphemeralAPIExamples, which provides fully worked examples of scripts for creating Ephemeral snapshots and database servers. In this article we selectively quote important elements of these scripts so as to optimise readability. For fully worked, runnable examples, please refer to the scripts in their entirety.
Snapshots
Before we start, what is a snapshot anyway? The easiest way to think about a snapshot is as a file. It's a file that contains the data you need in your database but without being a database server. That's a big difference.
A database server has CPU, memory, and a real or virtual disk and network card. A snapshot has none of those things. That's why there's an order of magnitude difference in costs. At today's prices, the smallest capacity AWS RDS Postgres server will cost you ~$180/month, but if you were only paying for the storage (100GB), that's $0.08/GB/month or $8/month in total. So keeping a database active when you don't need it is ~20x more expensive than it need be and we might not need 100GB of storage either.
Additionally, you can create a library of snapshots at a scale which would be impossible if working with actual servers. That's the beauty of snapshots.
Hydration
Hydration is the process of creating a database server using a snapshot. To do this using the Ephemeral API, first construct a configuration object.
DB_SERVER_CONFIG=$(cat <<DOC
{
"name":"${EPHEMERAL_DB_SERVER_NAME}",
"expiry":{
"expiryType": "Static",
"durationEnd": {
"minutesFromStartToExpiry": 360,
"minutesFromLastActivityToExpiry": 180
}
},
"volumeSnapshotId":"${SNAPSHOT_ID}",
"storageSizeInGigabytes": 10
}
DOC
)
Here we identify a snapshot by its id. The rest of the payload configures the database server. Note that to configure the lifetime of the database server, we use the expiry
object. We give it a maximum fixed lifetime (minutesFromStartToExpiry
), with additional time built in to allow for termination of usage (minutesFromLastActivityToExpiry
). We can also limit activation to specific business hours.
To create the database server we post the above object to the /api/database
endpoint of our Ephemeral server.
CREATE_EPHEMERAL_DB_URL="${TONIC_EPHEMERAL_SERVER}:${TONIC_EPHEMERAL_API_PORT}/api/database"
CREATE_EPHEMERAL_DB_RESPONSE=$(
curl -X 'POST' \
"$CREATE_EPHEMERAL_DB_URL" \
-H 'accept: */*' \
-H "Authorization: ${TONIC_EPHEMERAL_API_KEY}" \
-H 'Content-Type: application/json' \
-d "$DB_SERVER_CONFIG" \
2>/dev/null
)
As a side note, you will need an API Key - we cover this near the end.
The call above returns a database server id, which you can use to monitor the status of the request using the /api/database/${DATABASE_ID}
endpoint. In the code below, we wait for the state to change to "Running". The jq utility is used for JSON operations.
EPHEMERAL_DATABASE_ID=$(echo $CREATE_EPHEMERAL_DB_RESPONSE | sed 's/"//g')
# Get database config
DATABASE_DETAILS_URL="${TONIC_EPHEMERAL_SERVER}:${TONIC_EPHEMERAL_API_PORT}/api/database/${EPHEMERAL_DATABASE_ID}"
echo "Request made"
echo
echo "Now checking job status - need to wait for 'Running' state"
echo
while true
do
DATABASE_DETAILS_RESPONSE=$(
curl -X 'GET' \
$DATABASE_DETAILS_URL \
-H 'accept: */*' \
-H "Authorization: ${TONIC_EPHEMERAL_API_KEY}" \
-H 'Content-Type: application/json' \
2>/dev/null
)
DB_STATUS=$(echo $DATABASE_DETAILS_RESPONSE | jq ".status" | sed 's/"//g')
if [[ "$DB_STATUS" == "Running" ]]
then
echo "Snapshot ${DB_SNAPSHOT_NAME} is available via db server ${EPHEMERAL_DB_SERVER_NAME}"
echo
break
else
echo "Job status is ${DB_STATUS}"
fi
sleep 10
done
Database server credentials
Once your database server is in the Running state, you can obtain the credentials from the $DATABASE_DETAILS_RESPONSE
object and use them in your test pipeline.
DB_USER=$(echo $DATABASE_DETAILS_RESPONSE | jq ".databaseUserName" | sed 's/"//g')
DB_PASSWORD=$(echo $DATABASE_DETAILS_RESPONSE | jq ".databasePassword" | sed 's/"//g')
DB_NAME=$(echo $DATABASE_DETAILS_RESPONSE | jq ".databaseName" | sed 's/"//g')
DB_HOST=$(echo $DATABASE_DETAILS_RESPONSE | jq ".hostname" | sed 's/"//g')
DB_PORT=$(echo $DATABASE_DETAILS_RESPONSE | jq ".port" | sed 's/"//g')
Stop losing time provisioning and maintaining databases yourself. Spin up fully populated test databases in seconds with Tonic Ephemeral.
Data generation
A good question to ask is, how do I create a snapshot in the first place. Conveniently, Tonic Structural has the answer.
We start by assuming you have a Structural workspace set up which connects to a source database. A workspace is used to specify source data and how it should be transformed. If this is unfamiliar to you, consult our documentation. Additionally, the GitHub repository for this article contains a script that will set a workspace up for you.
Data generation, which is the process for creating synthetic data, can be initiated via the Structural Web UI. In this article, however, we assume you are more likely to be interested in how you can do this programmatically and in particular, how we create a synthetic data database using the Ephemeral platform.
First, we construct a generate request object. In it we give our snapshot a name and allow Ephemeral to size the requested database for us. We employ a naming scheme that will result in a library of manageable snapshots, indexed by date with a unique suffix. Use of code branch names or job ids could also be appropriate. The authorisation discussion is again postponed to the end of the article.
SNAPSHOT_PREFIX="Demo"
SNAPSHOT_DATE=$(date +%Y%m%d)
SNAPSHOT_SUFFIX=$(cat /dev/random | base64 | tr -dc 'a-zA-Z' | head -c 4)
SNAPSHOT_NAME="${SNAPSHOT_PREFIX}.${SNAPSHOT_DATE}.${SNAPSHOT_SUFFIX}"
GENERATE_REQUEST_BODY=$(cat <<DOC
{
"ephemeralConfigOverrides": {
"snapshotName": "${SNAPSHOT_NAME}",
"snapshotDescription": "Your description here",
"useCustomDatabaseSize": false,
"customDatabaseSizeGb": 0,
"customEphemeralApiKey": "${TONIC_EPHEMERAL_API_KEY}",
"keepDbActiveAfterGeneration": false
}
}
DOC
)
To request the snapshot, we use the /api/GenerateData/start
endpoint and supply the id of the relevant workspace.
GENERATE_URL="${TONIC_STRUCTURAL_SERVER}:${TONIC_STRUCTURAL_API_PORT}/api/GenerateData/start?workspaceId=${WORKSPACE_ID}"
GENERATE_RESPONSE=$(
curl -X 'POST' \
"$GENERATE_URL" \
-H 'accept: */*' \
-H "Authorization: ${TONIC_STRUCTURAL_API_KEY}" \
-H 'Content-Type: application/json' \
-d "$GENERATE_REQUEST_BODY" \
2>/dev/null
)
JOB_ID=$(echo $GENERATE_RESPONSE | jq ".id" | sed 's/"//g')
The generate request returns an object containing a job id. We can use the job id to check the generation status by polling the /api/job/${JOB_ID}
endpoint and waiting for the status field value to change to 'Completed'.
JOB_STATUS_URL="${TONIC_STRUCTURAL_SERVER}:${TONIC_STRUCTURAL_API_PORT}/api/Job/${JOB_ID}"
echo "Checking job status - need to wait for 'Completed' state"
echo
while true
do
JOB_STATUS_RESPONSE=$(
curl -X 'GET' \
"$JOB_STATUS_URL" \
-H 'accept: */*' \
-H "Authorization: ${TONIC_STRUCTURAL_API_KEY}" \
2>/dev/null
)
JOB_STATUS=$(echo $JOB_STATUS_RESPONSE | jq ".status" | sed 's/"//g')
if [[ "$JOB_STATUS" == "Completed" ]]
then
echo "Job ${JOB_ID} completed"
echo "Generated snapshot with name ${SNAPSHOT_NAME}"
break
elif [[ "$JOB_STATUS" == "Failed" ]]
then
echo "Job failed - check the ephemeral dashboard for job id ${JOB_ID}"
break
else
echo "Job status = ${JOB_STATUS}"
fi
sleep 10
done
Once the status reaches 'Completed' our snapshot is available, with the name we have given it, ${SNAPSHOT_NAME}
We can hydrate this snapshot at any time to create a database server as needed for our test pipeline.
Authentication
We gloss over this a little in order to maintain the focus of the article, but in brief, authentication to both the Structural and Ephemeral APIs is accomplished by presentation of an API key. We can see these used above - ${TONIC_STRUCTURAL_API_KEY}
& ${TONIC_EPHEMERAL_API_KEY}
. API keys are acquired from Structural as described in the Structural documentation. The process for Ephemeral is similar.
Conclusion
In this article we saw how we can take a database, mask the sensitive contents and create a snapshot of the results.
We store snapshots as files which can be used in conjunction with dynamically created database servers to quickly, easily and inexpensively create precisely configured environments for development purposes and beyond.
Interested? Try it out yourself or connect with your Customer Success Manager to get started.