Redis: Setting up a regularly updated Cache using Cron and Node

In the past few days, I've been setting up a redis cache on Azure for a Hacker News Visualization Tool. While the app has an extensive SQL database full of every HackerNews post, I decided to deploy a redis cache to store the most recent posts for easy access to the data.

Naturally, for the posts to be truly recent, we had to constantly update the cache, which seemed like a natural task for a Cron job. However, it gets a bit more complicated when you consider all the technologies required to deploy the app. I'll walk through my strategy here even though I'm constantly iterating on it.

stack

As demonstrated in the image, the whole system works something like this:

1. The cron job on the Virtual Machine will trigger a bash script, updateredis.sh, that will run node app.js.

2. The app.js file will interact with the hackernews API to pull recent items and store them in the redis cache.

3. In the HackerNews App, the server will pull recent data from the redis cache and send it to the user via a custom API.

The whole system requires a number of different technologies to work, but it actually was simpler than it seems. I'll walk through it here.

Step 1: Cron Job and updateredis.sh

After logging into our Virtual Machine via ssh (ssh azureuser@myvirtualmachine.cloudapps.net), we can create a new cronjob using crontab -e. The resulting file should look something like this:

30 * * * * sh ~/updateredis.sh

This will set the updateredis.sh file to run at hour intervals every day (an arbitrary time in this case). Now, we need to also create the script for the file. The update redis file will have two lines as follows:

#!/bin/bash
node ~/app.js

Keep in mind that for this to work, your app.js must be in the directory specified and your VM must have all necessary components installed (a process unfortunately beyond the scope of this discussion). In the next section, we'll talk about the app.js file that our script should run.

Step 2: Use node to get data and update redis

Now, we'll need to build an app.js file and use it to get the necessary data from the API and send it to our express server. In the simple example below, we do not use real API data, and the data we send to the redis cache is somewhat trivial. The implementation should look something like this:

var express = require('express');
var http = require('http');
var https = require('https');
var app = express();
var redis = require("redis"),
    // Initiate async function to connect to cache
    client =
 redis.createClient(1234, 'yourawesomecache.redis.cache.windows.net');

// Caches auth credentials and passes them to the cache
client.auth('abcdefghijklmnopqrstuvwxyz', function(){
  console.log("Stored Authentication credentials");
});

// Log to the console to indicate we've connected
client.on('connect', function(){
  console.log('successfully connected to redis');
});

// Remove all elements from Redis Database. Included for example simplicity
client.FLUSHDB();

// Send http request to get desired data. Note: this endpoint is not real
https.get('https://kevinmeurer.com/api/hobbies', function(res){
  var hobbies = '';
  res.on('data', function(chunk) {
    hobbies += chunk;
  });

  res.on('end', function() {
    hobbyList = JSON.parse(hobbies);
	// Add to redis database
    client.LPUSH("myhobbies", hobbyList, function(err, replies){
      if(err) throw err;
      // Close connection to redis cache
      client.end();
    }
  });
});

// Server exits

A few notable things are going on in this code, and I'll walk through them. At the top, we are requiring all the necessary modules and also opening a connection to our redis database with redis.createClient(port, host);. This will asynchronously connect to our database. Because Azure requires authentication, the code also stores the primary password in client.auth().

Below, we do a standard https request to the API endpoint for data, asynchronously loading it. Once the data has been fully loaded, we can add it to redis.

After it has finished, we add it to the cache by running redis commands on the client variable. Because this operation is also asynchronous, we close the connection to redis in the callback.

When we place this file on our VM with node installed, it will automatically refresh the Redis cache with new data from the API!

Step 3: Interact with the Redis Cache in the Primary Application

Now that our Redis is set up and constantly refreshing, we can interact with it in our application. This code will look similar to our code to add to it. We include the same code at the top of our server file:

var express = require('express');
var http = require('http');
var app = express();
var redis = require("redis"),
    client =
 redis.createClient(1234, 'yourawesomecache.redis.cache.windows.net');

client.auth('abcdefghijklmnopqrstuvwxyz', function(){
  console.log("Stored Authentication credentials");
});

client.on('connect', function(){
  console.log('successfully connected to redis');
});

This code allows you to interact with the Redis Cache using typical Redis commands. You can run whatever you want here. If I wanted to get the length of my 'myhobbies' List, I could write:

client.LLEN('myhobbies', function(err, length){
	console.log("LENGTH IS: ", length);
}

As a programmer on the project, you now don't need to worry about anything going on on the virtual machine. The cron job will ensure that it's up to date and that the app keeps running!

Concluding Thoughts

Redis is extremely useful for storing recent, regularly accessed data, and setting it up to be regularly updated isn't as bad as one might think. There are tons of alternative ways to implement something like this, though, so if you prefer another method, tell me in the comments!