Sunday, November 1, 2015

How to Map the Gulag (the visualizing)

The last post was about compiling the data from Memorial's website for a gulag map using Python. Getting the data is the most important part of creating a visualization but it is not as fun as actually making the map. This post will be something that will be easier to follow--i.e., with pictures.

The basic steps for creating a visualization like this are here:
  1. Download, clean and output data (download here and here)
  2. Create visualization
    • Consider potential tools and goals
    • Import data
    • Manage visual effects
    • Create legend
    • Export to video

Tools and Goals

There are different tools for displaying geographical data and each has advantages and disadvantages. What tool to use depends on how you envision the data looking, what features you want to give the audience, how much time you want to put into the visualization and what your technical limitations are.

If you wanted to put a lot of effort into a gulag-data project and know how to create a web application, it might make sense to create an entire website where mapping would be only one element driven by a database of gulag data. This website would give users a high amount of control, letting them search by camp, year, region, camp personnel, documents related the camps and so on. Memorial just posted something like this for places of incarceration in Moscow. I would love to see a project for the entire gulag along those lines but it would take more data than the Memorial site gives. Moreover, Gulag Maps already has put a lot of the maps out there based on the Memorial data. The last knock against this option is that it is difficult and time consuming. A really solid project like this might take months (or longer) and multiple researchers/programmers.

A less demanding option would be to use GoogleMaps or OpenLayers to overlay the camps on internet based maps. We could pull in the data with Javascript or you could upload every month as a layer in Fusion Tables (time consuming). Then, using Javascript (no getting around it) cycle through the dates at a set interval. It would be possible to give users the ability to stop the cycle, click on camps to get more information and to see the camps as a heatmap. There are ways of customizing the way the data is displayed, although using Google or OL will limit your options (especially Fusion Tables) or give you headaches. The learning curve here can also be steep if you don't already have experience with Javascript. Speed is probably the biggest issue. Fusion Tables is able to load layers pretty quickly but loading layers on the fly is not the strong suit of GoogleMaps. Here is a basic version I threw together [Update 6/2021: The whole point of this paragraph was to show that doing something like this didn't work and after Google discontinued Fusion Tables, it definitely does not work but it may not even load at this point]:

If you click on the Start Slides button, blobs come up where camps are. The placement is right but it is not rendering the heat map like I hoped it would. It also has trouble removing the blobs (possibly a problem with my code but I suspect a problem with loading layers automatically). A slide show won't work, although an option might be to let users pick the dates and get a more in depth view of each time period.

Another option is to use a GIS program like QGIS, a free and open source GIS software. The disadvantage of this option is that your output is not interactive. The user can't click on the map once you have exported it to an image or a movie. The advantage is that QGIS allows for easier customization and it has plugins like TimeManager, which takes care of cycling through the data. The other advantage is that clips and images can be shared more easily than a web map that needs to pull data constantly from the internet, and the framerate is not dependent on how fast Google or OSM can load the layers.

With these tools in mind, I decided to go with the last. The web application was too much work for too small a payoff. A web-based map is an attractive choice if you think users will want to pan in and out a lot and to stop on certain sites to look at the data. But the tradeoff was that the map would not load 432 different data sets very smoothly. The effect I was going for was a broad visualization of the gulag's development that would look seemless. For this purpose, having a GIS program generate a series of images to turn into a clip was the best option.

To follow along with the rest of this post, you'll need to install QGIS. Once you have installed the program and opened it, go to the menu Plugins -> Manage and Install. Find the plugin TimeManager (one word) and install it. If you are new to GIS and are weirded out that there is no map in this mapping program, go to the menu Web -> OpenLayers plugin -> OpenStreetMap -> OCM Landscape. (Even if you aren't weirded out, open up the OCM background because we will need it.)

Importing Data

To begin with, we will put together the map where different sized points represent different sized camps. The first thing to do is pan to the former USSR with a scale of about 90,000,000:1 (the scale is at the bottom and can be entered manually if you have trouble panning otherwise). It should look something like this:

The OCM Landscape provides a layer that we will use as a backdrop. It's not perfect, since the borders changed from the 1920s to the 1950s and have changed again since. But this layer will give a basic sense of geography. We will be adding three more layers:
  1. The visualization with the camps derived from the timegulag.csv
  2. A timestamp layer with totals by month derived fromgulagtotals.csv
  3. A legend to show the size of camps
Adding csv layers in QGIS is very easy. The sidebar has options for adding all sorts of layers from files and for creating layers as well. What we want is to add a csv so click on the icon that looks like a comma.

This will bring up a new window that gives you options for importing the layer. We need to give QGIS three things in particular: the delimiter (semicolon), where the x (longitude) and y (latitude) are and if the first row contains headings. If the latitude and longitude are in headers x and y, QGIS should set them as x and y by default.

Pressing OK should pull up a map like this, more or less:

All the camps for the entire thirty-six year period are on the map. A good start. Next, we want to import a layer with the totals from the gulagtotals.csv file. This file contains the total number of camps the Memorial data says were open and the estimate of the total population of the camps. QGIS does not make it possible to include a simple label that will change with each month. So we need to give this label its own coordinates that can carry its own data. I put it at 76,15, which puts it just south of Svalbard in the northwest corner.

Keep in mind, that one point isn't actually one point but 432 points stacked upon one another, each linked to the data for the monthly totals.

Managing Visual Effects

Everything has been loaded and can be modified to look like we want. To open up the properties of the main timegulag layer, double click on it in the side menu.

Clicking this button opens up a window with an intimidating number of options. Here we could make a heatmap just as easily but we will make a map with graduated symbols, points that will change with the data. In the first pulldown menu, change Single Symbol to Graduated Symbol.

The data we want to take into account is under the "total" column, so in the new menu that appears, select "total" from the dropdown menu next to Column. There are two ways QGIS will show the size of the data: by color and by size. Differentiating the colors would make sense if the data could distinguish the points by some category, e.g., if we knew the type of work prisoners did we might designate green for forestry, red for construction and so on. Increasing the intensify of color might also make sense for displaying size (e.g., from white for small to red for large), but increasing the size of the points seems more intuitive.

Use the dropdown to change Method to Size. In the Classes box, you can allow QGIS to sort the data for you by setting a number of different classes and hitting Classify. This function is especially useful if you want to use standard deviations or to make sure there are equal counts of points in each category. I have a pretty good sense of the data, though, so I want to make my own categories. Click Add Class four times. This will add four empty categories. Double click on the dots under Symbol to change their size. I made them 1, 2.75, 4.5 and 6.25 but they can be made smaller or larger. In the second column, I changed the values to 0-15000, 15001-50000, 50001-100000 and 100001-1000000. These are also flexible, depending on what you think counts as a big camp. I found that setting the largest camp size at 100001+ gave a nice effect because it meant that only a handful of the very biggest camps met the threshold. I also like to use red for the points so I went to Symbol and changed the color to red. Here is what this menu should look like when you are through:

There will now be a bunch of differentiated points on the map from the entire data set, regardless of year. To change this we need TimeManager. There should be a menu below the map itself after you installed TimeManager. If there isn't, go to Plugins -> TimeManager -> Toggle Visibility. The screen should look like this, more or less:

TimeManager is very easy to use. Click settings->Add Layer. In the menu that appears, select timegulag. The plugin will figure out when the data start and end--pretty neat. We need to tell it how often to refresh the data, because TimeManager defaults to minutes. Instead, click the dropdown menu and select months. If you felt like my map was too slow and want to pick up the pace, you can change the interval to three or six months. You could even go by year. When all is done, TimeManager will set the date to January 1924, giving us one point for Solovetskii and kind of an ugly timestamp in the southeast corner.

We want to add a label with the running totals and a nicer timestamp. Double click the layer gulagtotals in the left pane. Here we don't any point so change the size in the Style tab from 2 to 0. The point disappears from the map. Go to the Labels tab. Check the box next to "Label this layer with." To get the label we want, we will have to use a formula. Here is what that looks like:

format_date(animation_datetime(), 'MMMM yyyy') +'\nCamp Population: '+to_string(total)+'\nNumber of Camps: '+to_string(camps)

That is the only line of code in this post and it is not so bad. If you need to generate an unusual output with a formula, the QGIS expression editor is good about explaining what the functions do and the bottom of the window prints out a sample output so you can check if you are getting the result you need before putting it on the map. The most difficult part is knowing to use animation_datetime(). It returns whatever date TimeManager is currently using. The function format_date() takes a date (the first variable) and converts it for us to MMMM (meaning months written out in full) and yyyy (years, all four digits). The "\n" means new line and then we print "Camp Population: " and the string of the number of camps under the heading camps. QGIS allows us to convert the total column to a string (characters rather than numbers) with to_string(). Then we add another new line and do the same thing for number of camps with the camps column. Save the formula. Press OK and your map will look something like this.

TimeManager isn't handling the gulagtotals layer like it is for the camps themselves. It is displaying the right date--because it depends on TimeManager for the date--but it isn't displaying the right number of camps or population. Add the gulagtotals layer in TimeManager->Settings. The multiple date labels will become one (the correct one). In the same TimeManager->Settings menu, uncheck Display frame start time on map to remove the ugly timestamp. The placement of our new timestamp isn't great right now and I'm not crazy about the font. Double click on the gulagtotals layer again and go to the Labels tab to play around with the appearance. For a font, I like Source Code Pro, personally, and probably a little bigger. In the same box with Text there is a window for Placement. I put it in the middle upper area and the label looks pretty good on the map. But maybe you would prefer to put it in the Pacific or over Alaska. That is also possible by playing around with the Offset X,Y options. Here is what the map I have looks like after this step.

If you press the start button on TimeManager, the magic starts to happen. Unfortunately, this data set is not great for big changes at the beginning. And TimeManager does not load layers especially quickly with this much data. But it is magic nonetheless!

Creating a Legend

The last step for this map is to create a legend. If we were making a normal map in QGIS, it would be possible to go to Project->New Print Composer and draw on the map. However, that would mean stopping the map each month and printing manually. Not a great plan. Instead, we can create a legend by making a set of points on the map, setting their size to the same size as our camps and labeling them with the sizes of the camps. Click New Shapefile Layer, underneath the CSV layer button on the left pane. The type is Point and it will need a New Attribute of the type Text data. You can call the attribute "label." Save it somewhere on your computer. I'm saving it as "label" and will to refer to it as the label layer. Then click on the pencil icon beneath the Project menu to begin editing that layer and make points by clicking the icon with three points on the pencil's right (the Add Features button).

Create four points from north to south where they will be unobtrusive. I think they look nice in the Pacific and they won't be confused for camps there. Give the labels IDs 1, 2, 3, 4 and for labels, 1-15000, 15001-50000, 50001-100000, 100001+ (or other labels if your camps have a different graduated scale).  It should look something like this:

Press the save button next to the pencil and click the pencil again to stop editing. Double click the label layer so we can change the visuals. Just like the camps themselves, change the symbol type to Graduated and the method to Size. Here the Column will be id. Add four classes. Make the symbol of each of the four classes equal in size to the corresponding symbol for the camps layer. (Mine are 1, 2.75, 4.5 and 6.25.) Make the Value of the symbols 1-1.9, 2-2.9, 3-3.9 and 4-5, so that each point is covered by one category. In the Label tab, click "Label this layer with" and select "label." Change the font as you want. Hit OK and the map will look something like this:

That looks almost right but it needs some tweeking. Click the pencil again and then the icon with a green blob and an arrow (Move Features). It would be nice if the big points didn't overlap and if the label didn't partially cover Kamchatka. Once the points are in the right spots, click the save pencil icon. The label needs fixing as well, since it is right on top of the points. Double click the label layer in the left pane and click the Label tab. There, select Placement, Offset from Point and set the label to the right. It probably needs to be offset a few degrees to the east as well. I offset X by four degrees and that looks pretty good.


TimeManager has a nice function that exports a map for each time segment as an image file. Make sure the slider is at the beginning or set Time frame start to 1924-01-01 00:00:00.000. Click Export Video in the TimeManager menu, set the folder for export and press OK. It will warn you not to play with the map while it is exporting. Don't move the map! TimeManager isn't really doing anything fancy. It basically executes a Print Screen command for the map window every time it loads a new set of data. If you pan onto a different part of the map, TimeManager will export whatever the map looks like afterwards, ruining the clip. It should take ten or so minutes for this data set to export. The more images TimeManager has to save, the longer it will take.

Once TimeManager finishes, we have 432 images (Frame000, Frame001...Frame431). I then used ImageMagick (a command line image editing utility) to crop all the images because so much of Alaska was showing in all the frames. ImageMagick is a great and simple program that allows you conduct mass edits of an entire folder of images. I suspect some video editing software can crop a sequence, too. But you could create a video without doing this as well. If you are using a Mac, I think you can just drag the images into a new project in iMovie. This is true of Windows Movie Maker, which is mostly adequate but for some reason cropped random images when I tried it. So instead I used VirtualDub, a very basic video editor. (If you end you using VirtualDub, just open the first image and it will load the entire series). With any editor the frame rate probably needs to be faster than the default. I chose somewhere between five and six frames per second (about 180 milliseconds), which gave the video a nice quick pace without being too fast. It might make sense to go slower if you want it to be easier to see the monthly statistics. Then export the project as an avi.


I decided to skip over heatmaps for a bit because creating a heatmap layer is less complicated than creating the pointmap and involves the same steps. A heatmap is a map that shows density by varying the color or opacity of a marker. Making one of these is pretty easy. No legend is needed so the label layer can be unchecked. We could alter the settings of the timegulag layer but better to uncheck it and create a new layer, in case you want to go back to the point map later. Just import timegulag.csv again by clicking the comma button to import csv layer. It would make sense to rename the layer to something like heatgulag to avoid confusion.

Setting up a heatmap is similar to the graduated symbols. Double click on the layer heatgulag. In the menu Style, select Heatmap in the first dropdown. It will automatically choose a white-black scale that needs to be changed. Click Edit, select whatever color you like as color 2 and as color 1 choose transparent--otherwise this layer will cover the entire map with color 1. Type is "continuous" or else the heatmap points will come with weird borders. Weight the points by "total," the size of the camp, or else the map will weight each camp equally. Set render quality all the way to Best so that the heatmap points are not pixelated. Just like the graduated point map, TimeManager will print the heat map over time and you can compile it into a video.

This walkthrough was long but not difficult. There was only one line of code, really. I didn't talk about how to add cities to the map but anyone who followed along should be able to figure out how to create a new layer, add points and style the city labels as needed. Just like adding cities, you could juxtapose other layers as well--natural resources and camps, total population and prisoner density, transportation routes and camps.

GIS provides powerful ways of displaying data with a gentle learning curve. Time lapse maps in particular are a good way of showing change over time and making them is not difficult with QGIS and TimeManager. Not only are they capable of doing these big picture type visualizations over large periods of time but they are also good for doing reconstructions of fine grain historical data. For example, here is a little preview of what I am working on now, mapping the flight network of the USSR:

No comments:

Post a Comment