Running RethinkDB on Azure

Over a year ago a friend of mine aroused my interest for RethinkDB. At that time I played a little bit with it and thought that it appears to be a nice database. After that I focused on other topics, but recently I came back to RethinkDB. In this post I would like to explain how you can setup RethinkDB on Azure and play around with it.

Disclaimer: The described setup is for a development environment. It is not recommended for production usage.

What is RethinkDB?

So, what is RethinkDB? Simply said, RethinkDB is a JSON document storage.

Okay, I know a lot of document storage systems. How is RethinkDB different?

Good question! RethinkDB has a lot of features other document storage databases also have, but there are some features which differentiate RethinkDB. So, to decide which database system to use here are some features of RethinkDB:

  • Distributed joins
  • Map/Reduce
  • Web Administration Tool
  • MVCC concurrency
  • ReQL (the query language of RethinkDB)

This post will not focus on comparing RethinkDB with other systems. Luckily, RethinkDB tries to compare itself to other systems, especially with MongoDB. You will find the comparisons here and here. From an administrative perspective (and also for evaluation) the web administration tool sticks out. You will see later why.

Setup RethinkDB on Azure

Normally I like to use CentOS as the operating system for databases, such as MongoDB. I’m going to go a different way with RethinkDB and use Ubuntu Server for the host. What is the reason behind this decision? The installation of RethinkDB on an Ubuntu Server is straight forward. With CentOS running on Azure you have to do some extra work to connect for example to the Web Admin Console.

So, lets get started!

Head over to the Azure Portal and go to the VM gallery.

We will use an Ubuntu Server VM for setting up our database. Something to notice here: I don’t use the latest version of Ubuntu Server here (14.10 at the time of writing). The reason is that there seems to go something wrong when trying to install RethinkDB via apt-get. On version 14.04 everything should work smoothly.

On the next pages you have to configure the VM.

A point to mention: I personally like to connect to a VM via a username/password combination. You can also use a SSH key for the authentication. Next we do some further configuration. I don’t like the automatically generated storage account very much. Creating it myself gives me more control over some key features, such as redundancy. But that is just my own opinion. If you just want to play around and delete the VM afterwards you can safely go with the automatically generated storage account.

After the setup is completed we need access to our VM. A popular tool for doing this is PuTTY. The only thing we have to do is to paste in the URL of the VM. The appropriate SSH endpoint was configured before (if you have simply clicked through the VM creation).

If you get a security alert from PuTTY while connecting to the VM, just ignore it. That is the normal behavior if you connect to the VM the first time.

Log into the VM with the credentials you have created before. Now it is time to install RethinkDB. We will use the installation from the binaries. You can also install RethinkDB from the source. The following four lines will setup everything necessary.

    source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
    wget -qO- http://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
    sudo apt-get update
    sudo apt-get install rethinkdb

Now it is time to start our RethinkDB. Type rethinkdb into the command line and the database will start. At this point you should ask yourself how we can connect to our database from outside the VM. If you look closely into the output after starting the database you will notice that RethinkDB is using some standard endpoints (which can be configured) for several tasks.

Lets open these ports. Switch back to the Azure portal and go to the ENDPOINTS sections of your VM. We will now add the three missing endpoints, one for intracluster connections, one for client drivers and the one for the web administration tool. Click Add add the bottom of your page and add the endpoint for intracluster connections.

Do the same for the other two endpoints. Your endpoints page should now look similar to the following one.

Now let us try to connect to the web administration tool running on port 8080. Type the following into your browser http://YOUR-VM-NAME.cloudapp.net:8080/. You might notice that nothing happens. That is one of the security features built into RethinkDB. To get access to the web administration tool you will need to start the database differently. Stop the database and use rethinkdb --bind all. Now you should see the web administration tool.

Connect to RethinkDB

Now we have a RethinkDB single node installation up and running. It is time to play around with it!

Open the web administration tool (if not already open) and switch over to the tables. You might notice that there is already a database called test available.

Lets delete this database and create a new one. I stick with one named Breweries. A database can contain several tables. For now we will only need one, called Companies. Lets fill it with some data using ReQL. For that we can also use the web administration tool. For that go to Data Explorer. The starting point for everything is a simple character – r. Lets create the Affligem brewery as the first company.

r.db('Breweries').table('Companies').insert({
    name: "Affligem",
    location: "Belgium",
    beers: [
        {
            name: "Postel Blond",
            level: 7
        },
        {
            name: "Affligem Tripel",
            level: 9.5
        }
    ]
})

This should give us the following result:

{
    "deleted": 0 ,
    "errors": 0 ,
    "inserted": 1 ,
    "replaced": 0 ,
    "skipped": 0 ,
    "unchanged": 0
}

Have you noticed something? Yeah, pretty amazing, isn’t it! You get auto completion when typing in the console. But that isn’t the end. You also get a description of the function and an example how to use it. Pretty neat!

Lets add a second company:

r.db('Breweries').table('Companies').insert({
    name: "Diebels",
    location: "Germany",
    beers: [
        {
            name: "Diebels Pils",
            level: 4.9
        }
    ]
})

Now lets get back our documents.

r.db('Breweries').table('Companies')

That should give us the following:

[
    {
        "beers": [
            {
                "level": 4.9 ,
                "name":  "Diebels Pils"
            }
        ] ,
        "id":  "88aa6691-d4ed-4cfe-805f-d3e9b7e8242b" ,
        "location":  "Germany" ,
        "name":  "Diebels"
        } ,
        {
        "beers": [
            {
                "level": 7 ,
                "name":  "Postel Blond"
            } ,
            {
                "level": 9.5 ,
                "name":  "Affligem Tripel"
            }
        ] ,
        "id":  "452ce8ff-5f79-4c82-bc26-de54ab7b48fc" ,
        "location":  "Belgium" ,
        "name":  "Affligem"
    }
]

You can also switch between different views (the table view is showed in the following picture):

Conclusion

In this post we looked at what RethinkDB is and how you can install it on an Ubuntu Server VM running on Microsoft Azure. We focused primarily on interacting with the database through the web administration tool. Of course, this is not the end. From here you can setup your own cluster, extend your monitoring and use the client drivers. There are some official drivers for JavaScript, Ruby and Python, but there is also one community-driven driver for .NET.

Have fun!

Jan (@Horizon_Net)

Resources

Advertisements

One thought on “Running RethinkDB on Azure

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s