Monday, November 23, 2015

Flying the USSR: Summer of 1948

In one of the last posts, I linked to a visualization I like a lot--Ben Schmidt's 100 Years of Ships based on ship records digitized for climate data research. As it happened, I was thinking about this visualization when I came across KGB records of ships arriving in Soviet Ukraine from abroad. They are great sources, but only a handful of ships arrived each month, the coordinates of their journey were not given and the reports weren't regular. I thought about trying to put together a visualization from them, but they can't produce a shipping map like the shipping records from the 18th and 19th centuries. Nonetheless, the records led me to think about mobility within the USSR and the socialist bloc. Most people who have been to the former USSR have done some train travel, which has a long history (part of which, the cheapo platskart section, is coming to an end). However, it struck me that I didn't know that much about how the Soviet Union flew. It also seemed obvious that flights would make an interesting visualization, and one that could be revealing for thinking about how technology and social geography influenced travel.

I dug around a bit and found that plane enthusiasts are digitizing old Aeroflot flight schedules. This site is full of scanned timetables, like this one for Moscow from 1964:


What really interested me was not just looking at one city's schedule but putting together the whole air network, at least for a limited period of time. The owner of a Russian site called Avia History put up a large number of spreadsheets with timetables. Even better, it has time tables from seven cities for the summer of 1948: Adler (Sochi), Kiev, Leningrad, Moscow, Novosibirsk, Simferopol and Sverdlovsk (Ekaterinburg). Using these schedules, I came up with a visualization of the flight routes from tht year, represented as beginning on one day (but lasting three days in the case of routes with multiple stops):



And here is a longer video (~12 minutes but with music, so maybe people will stick with it for a few minutes) with the May 5 to September 14. It's worth watching at least a little of this video, since it gives a sense of the daily rhythms of the network. If you watch a month or two, the flights shifted somewhat to the south to the Caucasus and Crimea as the Soviet Union went on vacation.



Here is how I made these videos: I geocoded the start and end location of each flight, figured out the time between takeoff city and destination, and orientation of the plane. Then, for each flight I generated the location and orientation of the plane every five minutes as if it was traveling in a direct line to the destination at an equal speed the entire time. I exported this flight path plus color coding for flight type to a csv (see here). For the long flight video, I exported every flight for each day the flight was scheduled from May 5 to September 14. Then, I took the spreadsheet for use as a time layer using QGIS and TimeManager, following the same steps from the previous post about mapping the gulag.

Before I start analyzing what these clips and the data in general reveal about flight in the early postwar USSR, I'll mention caveats. The person who put up the spreadsheets does not give the source for the data in for the 1948 spreadsheets. Spreadsheets from other years give a source--an official timetable or a regional newspaper. I am assuming these timetables come from a similar source, although my attempts to contact the site's owner did not produce any results.

These are Aeroflot flights, that is, civil aviation (passenger, mail and cargo). They passed through, began or ended in one of the cities that I was able to find data for. This simulation probably captures 95+ percent of non-military flights. Obviously, having the schedule of every city would have been ideal. However, these flight schedules carried the entire route a plane took. So, for instance, both the Adler and Moscow timetables give Flight 208 as Adler-Moscow but the flight route went Tbilisi-Adler-Krasnodar-Rostov-Voronezh-Moscow. Those other flights (e.g., Tbilisi-Adler) are all listed in the schedule as well. For this reason, I don't think that having the Kazan' or Kuibyshev (Samara) schedules would add many, or any, flights, even though they are two of the biggest nodes in the network. How many flight routes would have originated in those cities that didn't also go through Moscow, Sverdlovsk or Leningrad? The two schedules that are most lacking are probably Khabarovsk and Tashkent, since some flights to distant locales may have originated there.

This is a simulation of the flight schedule, not the actual flights. Weather put flights off course, there were mechanical errors, accidents and so on. The planes take a linear route from their city of departure to their destination, unlike in reality. The visualization 100 Years of Shipping used captain's logs from each trip, which plotted the course of ships with a high level of detail. I don't even have both the exact departure and arrival times. Instead, the schedules show when a plane arrives in one city and when it arrives in the next and do not provide for layovers. This is especially visible on the long-haul Moscow-Siberia-Far East flight routes that took two days. In the simulation, it seems like the plane flying the whole night. Obviously, that is impossible. However, I didn't want to guess what the layover times were so readers will have to interpret very slowly moving nighttime planes as Soviets enjoying Siberian layovers. A similar problem is that one flight (Gorkii-Leningrad) gives what may be too fast a time and zooms by once every couple days.

Caveats aside, these visualizations reveal interesting aspects of how Soviet transport networks operated, the relationship of the provinces to the center and how technology and distance affected travel in the USSR. Putting together this visualization was made easier because the state-run Soviet civil air system was very centralized, unlike in the United States where there were many commercial airlines, private planes and so on.

The big thing to note about air travel in this period was that its structure was more like long-haul train travel. Of just over a thousand different flights, 822 were part of a route with multiple stops. Of the 284 multiple-stop routes, 107 were two-hops like Moscow-Leningrad-Helsinki and another 72 were three-hops, like Moscow-Voronezh-Rostov-Krasnodar. There were 105 multi-stop routes that had four or more hops, including a route (and its reverse) with ten stops: Moscow-Kazan'-Sverdlovsk-Omsk-Novosibirsk-Krasnoiarsk-Irkutsk-Chita-Takhtamygda-Tygda-Khabarovsk. Many of them also combined passenger service with cargo and mail delivery. Here is a map of the flight network that illustrates this point about the multi-leggedness of Soviet air travel in 1948:


Looking at this map (where thicker lines reflect greater frequency of the route), it would be easy to confuse it with a rail map, except in the few cases where flights go over water. Part of this outcome depended on technological limitations. Among the three planes Aeroflot was using, the Lisunov Li-2 had the best maximum range at 2,500 kilometers. That will not go from Moscow to Vladivostok, but it will get pretty far. Here is a video showing the maximum range of the Li-2 (as well as the city names--a little busy for the visualization, but useful in this case) from Moscow:


The range is actually not so bad, even in a country the size of the USSR. It would have tested the limits of the technology at the time to run a Moscow-Omsk flight but it's likely that distance was only one factor. My guess is that the planes operated like trains from Moscow. Most passengers going to the end of the line probably came from the capital rather than from an intermediate stop. It seems plausible that limited numbers of planes and the lack of people/things to drop off in each regional city made it attractive to run a route from Moscow through many cities to save resources.

This pattern, with Moscow being THE center, is somewhat different from what was going on in commercial air travel in the US at the time. The comparison is difficult because it was possible to hire a plane in the US in a way that was impossible in the USSR. An ordinary passenger in the US could travel outside the main routes and Soviets could not. I am also not sure how it would compare to US air cargo transit. But looking at passenger routes between major cities is reasonable. For example, to get from Washington DC to Los Angeles in 1951 on American Airlines, you didn't have to fly DC-Nashville-Dallas-Phoenix-LA. It would make more sense to go DC-Tulsa/Chicago-LA, skipping all the smaller cities.

From Airways News
That Moscow was important in the air network of Stalin's USSR is not all that surprising. It's possible to show the centrality of Moscow mathematically in a couple ways using network theory. All the cities in network are connected to one another through the network but it takes a varying number of steps. For instance, the longest distance between two points is eleven legs between Vladivostok and Vorkuta. Averaging the number of hops between one city and all the other cities in the network is that city's centrality. (There are ways to weight the distance a flight takes as well, taking into account, for example, the actual distance or the number of flights per week. I talked about this a lot in a previous post about Soviet film networks.)

Moscow is easily the most central city. Here is a spreadsheet with the centralities. From Moscow, it took an average of 2.3 flights to get anywhere in the network and a maximum of seven flights. Sverdlovsk and Leningrad were not too far behind, both at about two and a half hops. On the whole it took an average of about four flights to get from any random city in the network to any other random city. Here is another good test that shows Moscow's centrality: Close a city's airports, meaning no flights go to or from, effectively taking it out of the network. What happens to the average number of steps it take to get anywhere in the network? How many places become inaccessible to the main network? This test really shows how important Moscow was in the network since the average number of flights it took to get from any random point to any other random point went up a half flight (data here). The only other city that has a significant uptick by this measure was Aktiubinsk (Aktobe), which was a shortcut in the network to cities in southern Central Asia. Here is what it looks like when you take Moscow out of the network:


Moscow was so central that removing all the routes that originate or end in Moscow eliminates half of the cities in the network. The disappearing cities are mostly from foreign flights and the long-haul routes to the Far East. All the cargo only flights disappear, although mixed cargo/mail/passenger flights were run so it is hard to tell if the lack of cargo flights actually says something about the use of aviation for deliveries between Moscow and the provinces. Here is what the network looks like with these Moscow routes removed:


When Moscow is removed, though, the remaining forty-seven cities serviced by non-Moscow routes became more central. Removing routes at the very end of the network means that the average number of flights between random cities goes down (e.g., not possible to go from Leningrad to Vladivostok without the Moscow route, thus the large number of stops it takes are not included). It is also a sign that the Soviet air network existed outside of Moscow. I expected that this interconnectedness between regional capitals was because the summer schedule meant that vacationers from central Russia, Siberia or the Urals were going to the south. However, the cities that should have become more central if that was the case did not gain as much as I expected. Mineralnye Vody is the sixth most central and Adler is the tenth. Simferopol and Krasnodar are even lower.

Much like the US flight network in the 1950s, the Soviet network gravitated toward big cities and stopover points that led to large populations (Tulsa). Thus, the two capitals Moscow and Leningrad were major hubs. But the network also had technical and geographical limitations that partially remain today. The best example is Sverdlovsk. Novosibirsk was about the same size or even a little larger than Sverdlovsk in 1948. With the planes of the time, though, there was no way to reach Novosibirsk without stopping. Moreover, Sverdlovsk was in the middle of the dense, interconnected USSR network and could reach the more populated areas of the country. Novosibirsk was at the very edge of that network. Here is a map of the flight network on top of a population density map (it did not georeference perfectly, unfortunately) of the USSR from the late Soviet period (but still basically reflecting the population density of the 1940s):


So geographic centrality, technology and population density equalled centrality in the flight network. That really hasn't changed a whole lot, although new technology has made previously hard-to-reach cities like Novosibirsk into mini-hubs like Sverdlovsk was (and Ekaterinburg still is today). What has changed is the train-like route system Aeroflot used. No one flies from Moscow to Vladivostok now with ten stops in between. From anywhere within the former USSR, it shouldn't take more than one or two stops to fly to Vladivostok (e.g., Murmansk-Moscow(-Novosibirsk)-Vladivostok). Perhaps mail or cargo may still take these long routes but the passenger experience has become more centralized. To use one example from this network, Aktiubinsk acted as an important node connecting Central Asia with other parts of the Soviet Union. Today, Aktobe's airport has one flight to Moscow and a few to cities in Kazakhstan.

What happened to Aktobe and other cities as mini-hubs? Advances in technology probably obviated the need to use them as a refueling point. But in later flight schedules, too, regional airports tended to have more flights than they do now. The airport at Elista in Kalmykia in 1975, for example, flew to nineteen or so major cities and maybe a dozen minor cities. Now you can only fly commercially from Elista to Moscow. My guess is that when the state was running the country's aviation industry, it was possible to run flights to the corners of the Soviet Union without worrying about making a profit. It was also possible to use the flights for shipping and mail. As post-Soviet airlines commercialized, running a route to Elista or Aktobe stopped being viable. And, of course, now the wealthy can charter jets to provincial cities whereas the USSR's privileged did not have that luxury.

After playing with plane visualizations and network theory, the social-political historian who began this post from the archives has begun to wonder what this all means for the history of Soviet politics, economics or living people who flew in the USSR. Here is where an in depth analysis of the plane network would benefit from some archival or memoir sources. A big question is--how connected were the Soviet regions to one another without the capital? A pure mathematical answer says they were pretty well connected. But I suspect the experience of Soviet passengers and airline workers would echo Chekhov's Three Sisters: To Moscow! To Moscow! To Moscow! And of course this experiment in mapping the Soviet airline network can say little about how Soviet people and politicians viewed air travel. Was it a revolution in transportation that could take people to entirely new places? This network seems to say that it was more of an evolution, a kind of express train. However, commentary from people at the time would be important to answering that question.

What I am pointing to here is both the promise and limitations of digital humanities projects that work with big data like this one. The problem with just studying an air transport network is that it has no humans. In some ways we are entering the realm of historical transportation geography and graph theory. On the other hand, this network gives an idea of the broader possibilities that ordinary people and authorities encountered, as well as their technical or geographical (social, political and physical) limitations. It can give a sense of the ways that people and things could move in the USSR.

In short, I would love to see someone write a good archival/memoir-based history of Soviet air travel that might serve as a nice companion to Lewis Siegelbaum's book on automobiles in the USSR. But I would also hope that work considers broader structural issues that I have explored here.

Reader, this has been a long post. If you have made it this far, you deserve to enjoy my favorite Soviet song about flying:

Sunday, November 1, 2015

How to Map the Gulag (the visualizing)

The last post was about compiling the data from Memorial's website for a gulag map using Python. Getting the data is the most important part of creating a visualization but it is not as fun as actually making the map. This post will be something that will be easier to follow--i.e., with pictures.

The basic steps for creating a visualization like this are here:
  1. Download, clean and output data (download here and here)
  2. Create visualization
    • Consider potential tools and goals
    • Import data
    • Manage visual effects
    • Create legend
    • Export to video

Tools and Goals

There are different tools for displaying geographical data and each has advantages and disadvantages. What tool to use depends on how you envision the data looking, what features you want to give the audience, how much time you want to put into the visualization and what your technical limitations are.

If you wanted to put a lot of effort into a gulag-data project and know how to create a web application, it might make sense to create an entire website where mapping would be only one element driven by a database of gulag data. This website would give users a high amount of control, letting them search by camp, year, region, camp personnel, documents related the camps and so on. Memorial just posted something like this for places of incarceration in Moscow. I would love to see a project for the entire gulag along those lines but it would take more data than the Memorial site gives. Moreover, Gulag Maps already has put a lot of the maps out there based on the Memorial data. The last knock against this option is that it is difficult and time consuming. A really solid project like this might take months (or longer) and multiple researchers/programmers.

A less demanding option would be to use GoogleMaps or OpenLayers to overlay the camps on internet based maps. We could pull in the data with Javascript or you could upload every month as a layer in Fusion Tables (time consuming). Then, using Javascript (no getting around it) cycle through the dates at a set interval. It would be possible to give users the ability to stop the cycle, click on camps to get more information and to see the camps as a heatmap. There are ways of customizing the way the data is displayed, although using Google or OL will limit your options (especially Fusion Tables) or give you headaches. The learning curve here can also be steep if you don't already have experience with Javascript. Speed is probably the biggest issue. Fusion Tables is able to load layers pretty quickly but loading layers on the fly is not the strong suit of GoogleMaps. Here is a basic version I threw together:




If you click on the Start Slides button, blobs come up where camps are. The placement is right but it is not rendering the heat map like I hoped it would. It also has trouble removing the blobs (possibly a problem with my code but I suspect a problem with loading layers automatically). A slide show won't work, although an option might be to let users pick the dates and get a more in depth view of each time period.

Another option is to use a GIS program like QGIS, a free and open source GIS software. The disadvantage of this option is that your output is not interactive. The user can't click on the map once you have exported it to an image or a movie. The advantage is that QGIS allows for easier customization and it has plugins like TimeManager, which takes care of cycling through the data. The other advantage is that clips and images can be shared more easily than a web map that needs to pull data constantly from the internet, and the framerate is not dependent on how fast Google or OSM can load the layers.

With these tools in mind, I decided to go with the last. The web application was too much work for too small a payoff. A web-based map is an attractive choice if you think users will want to pan in and out a lot and to stop on certain sites to look at the data. But the tradeoff was that the map would not load 432 different data sets very smoothly. The effect I was going for was a broad visualization of the gulag's development that would look seemless. For this purpose, having a GIS program generate a series of images to turn into a clip was the best option.

To follow along with the rest of this post, you'll need to install QGIS. Once you have installed the program and opened it, go to the menu Plugins -> Manage and Install. Find the plugin TimeManager (one word) and install it. If you are new to GIS and are weirded out that there is no map in this mapping program, go to the menu Web -> OpenLayers plugin -> OpenStreetMap -> OCM Landscape. (Even if you aren't weirded out, open up the OCM background because we will need it.)


Importing Data

To begin with, we will put together the map where different sized points represent different sized camps. The first thing to do is pan to the former USSR with a scale of about 90,000,000:1 (the scale is at the bottom and can be entered manually if you have trouble panning otherwise). It should look something like this:



The OCM Landscape provides a layer that we will use as a backdrop. It's not perfect, since the borders changed from the 1920s to the 1950s and have changed again since. But this layer will give a basic sense of geography. We will be adding three more layers:
  1. The visualization with the camps derived from the timegulag.csv
  2. A timestamp layer with totals by month derived fromgulagtotals.csv
  3. A legend to show the size of camps
Adding csv layers in QGIS is very easy. The sidebar has options for adding all sorts of layers from files and for creating layers as well. What we want is to add a csv so click on the icon that looks like a comma.



This will bring up a new window that gives you options for importing the layer. We need to give QGIS three things in particular: the delimiter (semicolon), where the x (longitude) and y (latitude) are and if the first row contains headings. If the latitude and longitude are in headers x and y, QGIS should set them as x and y by default.


Pressing OK should pull up a map like this, more or less:


All the camps for the entire thirty-six year period are on the map. A good start. Next, we want to import a layer with the totals from the gulagtotals.csv file. This file contains the total number of camps the Memorial data says were open and the estimate of the total population of the camps. QGIS does not make it possible to include a simple label that will change with each month. So we need to give this label its own coordinates that can carry its own data. I put it at 76,15, which puts it just south of Svalbard in the northwest corner.


Keep in mind, that one point isn't actually one point but 432 points stacked upon one another, each linked to the data for the monthly totals.


Managing Visual Effects

Everything has been loaded and can be modified to look like we want. To open up the properties of the main timegulag layer, double click on it in the side menu.


Clicking this button opens up a window with an intimidating number of options. Here we could make a heatmap just as easily but we will make a map with graduated symbols, points that will change with the data. In the first pulldown menu, change Single Symbol to Graduated Symbol.


The data we want to take into account is under the "total" column, so in the new menu that appears, select "total" from the dropdown menu next to Column. There are two ways QGIS will show the size of the data: by color and by size. Differentiating the colors would make sense if the data could distinguish the points by some category, e.g., if we knew the type of work prisoners did we might designate green for forestry, red for construction and so on. Increasing the intensify of color might also make sense for displaying size (e.g., from white for small to red for large), but increasing the size of the points seems more intuitive.

Use the dropdown to change Method to Size. In the Classes box, you can allow QGIS to sort the data for you by setting a number of different classes and hitting Classify. This function is especially useful if you want to use standard deviations or to make sure there are equal counts of points in each category. I have a pretty good sense of the data, though, so I want to make my own categories. Click Add Class four times. This will add four empty categories. Double click on the dots under Symbol to change their size. I made them 1, 2.75, 4.5 and 6.25 but they can be made smaller or larger. In the second column, I changed the values to 0-15000, 15001-50000, 50001-100000 and 100001-1000000. These are also flexible, depending on what you think counts as a big camp. I found that setting the largest camp size at 100001+ gave a nice effect because it meant that only a handful of the very biggest camps met the threshold. I also like to use red for the points so I went to Symbol and changed the color to red. Here is what this menu should look like when you are through:


There will now be a bunch of differentiated points on the map from the entire data set, regardless of year. To change this we need TimeManager. There should be a menu below the map itself after you installed TimeManager. If there isn't, go to Plugins -> TimeManager -> Toggle Visibility. The screen should look like this, more or less:



TimeManager is very easy to use. Click settings->Add Layer. In the menu that appears, select timegulag. The plugin will figure out when the data start and end--pretty neat. We need to tell it how often to refresh the data, because TimeManager defaults to minutes. Instead, click the dropdown menu and select months. If you felt like my map was too slow and want to pick up the pace, you can change the interval to three or six months. You could even go by year. When all is done, TimeManager will set the date to January 1924, giving us one point for Solovetskii and kind of an ugly timestamp in the southeast corner.



We want to add a label with the running totals and a nicer timestamp. Double click the layer gulagtotals in the left pane. Here we don't any point so change the size in the Style tab from 2 to 0. The point disappears from the map. Go to the Labels tab. Check the box next to "Label this layer with." To get the label we want, we will have to use a formula. Here is what that looks like:

format_date(animation_datetime(), 'MMMM yyyy') +'\nCamp Population: '+to_string(total)+'\nNumber of Camps: '+to_string(camps)

That is the only line of code in this post and it is not so bad. If you need to generate an unusual output with a formula, the QGIS expression editor is good about explaining what the functions do and the bottom of the window prints out a sample output so you can check if you are getting the result you need before putting it on the map. The most difficult part is knowing to use animation_datetime(). It returns whatever date TimeManager is currently using. The function format_date() takes a date (the first variable) and converts it for us to MMMM (meaning months written out in full) and yyyy (years, all four digits). The "\n" means new line and then we print "Camp Population: " and the string of the number of camps under the heading camps. QGIS allows us to convert the total column to a string (characters rather than numbers) with to_string(). Then we add another new line and do the same thing for number of camps with the camps column. Save the formula. Press OK and your map will look something like this.


TimeManager isn't handling the gulagtotals layer like it is for the camps themselves. It is displaying the right date--because it depends on TimeManager for the date--but it isn't displaying the right number of camps or population. Add the gulagtotals layer in TimeManager->Settings. The multiple date labels will become one (the correct one). In the same TimeManager->Settings menu, uncheck Display frame start time on map to remove the ugly timestamp. The placement of our new timestamp isn't great right now and I'm not crazy about the font. Double click on the gulagtotals layer again and go to the Labels tab to play around with the appearance. For a font, I like Source Code Pro, personally, and probably a little bigger. In the same box with Text there is a window for Placement. I put it in the middle upper area and the label looks pretty good on the map. But maybe you would prefer to put it in the Pacific or over Alaska. That is also possible by playing around with the Offset X,Y options. Here is what the map I have looks like after this step.


If you press the start button on TimeManager, the magic starts to happen. Unfortunately, this data set is not great for big changes at the beginning. And TimeManager does not load layers especially quickly with this much data. But it is magic nonetheless!


Creating a Legend

The last step for this map is to create a legend. If we were making a normal map in QGIS, it would be possible to go to Project->New Print Composer and draw on the map. However, that would mean stopping the map each month and printing manually. Not a great plan. Instead, we can create a legend by making a set of points on the map, setting their size to the same size as our camps and labeling them with the sizes of the camps. Click New Shapefile Layer, underneath the CSV layer button on the left pane. The type is Point and it will need a New Attribute of the type Text data. You can call the attribute "label." Save it somewhere on your computer. I'm saving it as "label" and will to refer to it as the label layer. Then click on the pencil icon beneath the Project menu to begin editing that layer and make points by clicking the icon with three points on the pencil's right (the Add Features button).


Create four points from north to south where they will be unobtrusive. I think they look nice in the Pacific and they won't be confused for camps there. Give the labels IDs 1, 2, 3, 4 and for labels, 1-15000, 15001-50000, 50001-100000, 100001+ (or other labels if your camps have a different graduated scale).  It should look something like this:


Press the save button next to the pencil and click the pencil again to stop editing. Double click the label layer so we can change the visuals. Just like the camps themselves, change the symbol type to Graduated and the method to Size. Here the Column will be id. Add four classes. Make the symbol of each of the four classes equal in size to the corresponding symbol for the camps layer. (Mine are 1, 2.75, 4.5 and 6.25.) Make the Value of the symbols 1-1.9, 2-2.9, 3-3.9 and 4-5, so that each point is covered by one category. In the Label tab, click "Label this layer with" and select "label." Change the font as you want. Hit OK and the map will look something like this:


That looks almost right but it needs some tweeking. Click the pencil again and then the icon with a green blob and an arrow (Move Features). It would be nice if the big points didn't overlap and if the label didn't partially cover Kamchatka. Once the points are in the right spots, click the save pencil icon. The label needs fixing as well, since it is right on top of the points. Double click the label layer in the left pane and click the Label tab. There, select Placement, Offset from Point and set the label to the right. It probably needs to be offset a few degrees to the east as well. I offset X by four degrees and that looks pretty good.


Exporting

TimeManager has a nice function that exports a map for each time segment as an image file. Make sure the slider is at the beginning or set Time frame start to 1924-01-01 00:00:00.000. Click Export Video in the TimeManager menu, set the folder for export and press OK. It will warn you not to play with the map while it is exporting. Don't move the map! TimeManager isn't really doing anything fancy. It basically executes a Print Screen command for the map window every time it loads a new set of data. If you pan onto a different part of the map, TimeManager will export whatever the map looks like afterwards, ruining the clip. It should take ten or so minutes for this data set to export. The more images TimeManager has to save, the longer it will take.

Once TimeManager finishes, we have 432 images (Frame000, Frame001...Frame431). I then used ImageMagick (a command line image editing utility) to crop all the images because so much of Alaska was showing in all the frames. ImageMagick is a great and simple program that allows you conduct mass edits of an entire folder of images. I suspect some video editing software can crop a sequence, too. But you could create a video without doing this as well. If you are using a Mac, I think you can just drag the images into a new project in iMovie. This is true of Windows Movie Maker, which is mostly adequate but for some reason cropped random images when I tried it. So instead I used VirtualDub, a very basic video editor. (If you end you using VirtualDub, just open the first image and it will load the entire series). With any editor the frame rate probably needs to be faster than the default. I chose somewhere between five and six frames per second (about 180 milliseconds), which gave the video a nice quick pace without being too fast. It might make sense to go slower if you want it to be easier to see the monthly statistics. Then export the project as an avi.


Heatmaps

I decided to skip over heatmaps for a bit because creating a heatmap layer is less complicated than creating the pointmap and involves the same steps. A heatmap is a map that shows density by varying the color or opacity of a marker. Making one of these is pretty easy. No legend is needed so the label layer can be unchecked. We could alter the settings of the timegulag layer but better to uncheck it and create a new layer, in case you want to go back to the point map later. Just import timegulag.csv again by clicking the comma button to import csv layer. It would make sense to rename the layer to something like heatgulag to avoid confusion.

Setting up a heatmap is similar to the graduated symbols. Double click on the layer heatgulag. In the menu Style, select Heatmap in the first dropdown. It will automatically choose a white-black scale that needs to be changed. Click Edit, select whatever color you like as color 2 and as color 1 choose transparent--otherwise this layer will cover the entire map with color 1. Type is "continuous" or else the heatmap points will come with weird borders. Weight the points by "total," the size of the camp, or else the map will weight each camp equally. Set render quality all the way to Best so that the heatmap points are not pixelated. Just like the graduated point map, TimeManager will print the heat map over time and you can compile it into a video.


This walkthrough was long but not difficult. There was only one line of code, really. I didn't talk about how to add cities to the map but anyone who followed along should be able to figure out how to create a new layer, add points and style the city labels as needed. Just like adding cities, you could juxtapose other layers as well--natural resources and camps, total population and prisoner density, transportation routes and camps.

GIS provides powerful ways of displaying data with a gentle learning curve. Time lapse maps in particular are a good way of showing change over time and making them is not difficult with QGIS and TimeManager. Not only are they capable of doing these big picture type visualizations over large periods of time but they are also good for doing reconstructions of fine grain historical data. For example, here is a little preview of what I am working on now, mapping the flight network of the USSR:


video

Tuesday, October 27, 2015

How to Map the Gulag (the data)

I had a few people ask how I made the gulag videos--what kind of tools and time were involved. So this writeup is not about the gulag itself but about how I made the last map video. The last post was intended primarily as something I might show my students to illustrate the development of the camp system. What I want to do here is present something that scholars or digital history students could use to think about how one might make a map like this. This isn't a walkthrough since I doubt anyone is willing to put in the time I did to generate the data. There is code included but far from all. But for people interested in doing digital history, it may be useful to see the process and to get a sense of the kind of coding that is necessary to get usable data from a set of websites on the web.

Any project like this can be divided broadly into two steps with some substeps:
  1. Getting data
    • Downloading
    • Sorting
    • Cleaning
    • Geocoding
    • Extrapolating
    • Outputting
  2. Visualizing (for next post)

Getting Data

A dynamic map needs a set of time-staggered geographical data. These data could be nuclear detonations (Isao Hashimoto), eighteenth and nineteenth century shipping routes (Ben Schmidt)  or the shifting borders in Europe since 4000BC (some YouTube user named MrOwnerandPwner who makes a lot of these maps). The last two maps are a little more complicated to make because they use lines and polygons to visualize data. Points are less problematic since the minimum you need for each data point is a timestamp and coordinates. If the data has a populartion or other categories it is possible to weight the points or otherwise differentiating the visualization by category, like colors to show what kind of work prisoners did. The gulag is a good example since it has points and also because all the data are available on Memorial's website (click on the Лагуправление header).

The easiest way to import a point map into a GIS program is as a delimited file, often called comma separated values or csv. This is just a text file that duplicates the functions of a spreadsheet by using a comma (or tab, or semi-colon or whatever else) to separate the cells. The first row usually contains the category names. Ultimately, the csv for a map like this will consist of many thousands of rows like the first in the csv for the gulag map, which looks like this (available here):

date;name;y;x;size
1924-01;SOLOVETsKII ITL OGPU;65.028257;35.717330;3531


So how do we get this nice looking (technically speaking) data. Memorial's gulag site was written in HTML4 and uses frames to keep the header and sidebar while loading the camps in the central frame. If you click on the side link for the Solovetskii ITL, it will load Memorial's information on that camp. Although you can't tell from the url, each of the camps is a different page. Here is the page for the Solovetskii camp:



There are a few ways to get the data out of this collection. It would be possible to go through all the camps and copy and paste the information manually into a spreadsheet. But there are 475 camps and copying every entry would be very time consuming. A better way of getting the data is with programming, by finding patterns in the data structure and using a scripting language to format the information in a way that is usable. This is also known as data scraping.

Learning how to scrape data with Python or another language takes some time and each page is different. I am going to post code here and it will be comprehensible if you are familiar with programming, even if you don't know Python. If you are just beginning, The Programming Historian has several lessons under the Data Manipulation heading (including mine) that go through the Python syntax needed for scraping. That site also has information about installation. I also like Codecademy lessons if you want practice with Python.


Downloading

The url for each camp follows the pattern: http://www.memo.ru/history/nkvd/gulag/r3/r3-X.htm with X being a number between 1 and 475. If you know this, you can use Python's urllib2 module to download all the camp pages very quickly. Python's BeautifulSoup module parses the data, making it possible to find a part of the page by its HTML tag. If you wanted to download and store a parsed version of the HTML for all the sites into a Python list, you would use this code (note, it will take a few minutes to run the last line):

import urllib2 
from bs4 import BeautifulSoup 
pages=[BeautifulSoup(urllib2.urlopen('http://www.memo.ru/history/nkvd/gulag/r3/r3-'+str(number)+'.htm').read()) for number in range(1,476)]


Sorting

The name of the camp is the first text on the page and it is the first text with bold <b> header. It's possible to create a dictionary, an "associative array" that keeps a list of data under headers, where each camp itself is a dictionary with information. Here is the code for that:
camps={camp.b.text:{} for camp in pages}

Now if you entered camps['СОЛОВЕТСКИЙ ИТЛ ОГПУ'], it would pull up an empty entry from that dictionary. Memorial gives data in a set of tables where the first table cell <td> is the category heading and the data themselves are in the second cell. For Python, the first cell is actually cell zero. The data needed for this map relate to time, location and size: Time of Operation (Время существования), Location (Дислокация) and Size (Численность). On my computer--and my understanding is that this is true for English-language Windows in general--Python does not like Cyrillic. I often transliterate the entire text (see my Programming Historian lesson). But in this case I needed to keep the Cyrillic because Yandex likes it better for Geocoding. Instead I used unicode, a set of codes that symbolize letters, to create the variables. Each of the funny \u0441 codes symbolizes a Russian letter. The variable names should let you know what they mean:

timeofoperation=u'\u0412\u0440\u0435\u043c\u044f \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u043e\u0432\u0430\u043d\u0438\u044f:'
size=u'\u0427\u0438\u0441\u043b\u0435\u043d\u043d\u043e\u0441\u0442\u044c:'
location=u'\u0414\u0438\u0441\u043b\u043e\u043a\u0430\u0446\u0438\u044f:'

for camp in pages:
    campname=camp.b.text
    for datapoint in camp.find_all('tr'):
        category=datapoint.find_all('td')[0].text
        entry=datapoint.find_all('td')[1].text
        if category in [timeofoperation,location,size]:
            camps[campname][category]=entry


Cleaning

All the data are now in the Python variable camps. However, the data need to be cleaned, meaning that we need to make the information uniform so that a computer can read it into a program. Cleaning data is in many cases, including this one, the most time consuming and tedious part of putting together a visualization.

The Memorial data are very thorough but very messy. I get the feeling someone typed it out by hand and occasionally made little mistakes that are difficult to catch with a program. At this point, it might be easier (although longer) to print out the data we have, put it into a spreadsheet and edit the 475 camps in Excel or using regular expressions (patterns for identifying text) and copying the coordinates from a website. What I did was look for patterns in the text with Python, removing what I didn't need and regularizing the rest. I am not going to go through the code here because it would take too long. Instead, I will identify the challenges of cleaning the data in broad terms:
  1. The size of the camp:
    • The entries for a camp's size at a given date usually have a regular pattern that looks like this "(xx.)xx.xx — xx xxx" (e.g., 01.01.30 — 53 123). Python has a regular expression generator that can help find these patterns and capture these data points. However, there are some camps where the data are formatted differently and we need to get that information out as well.
    • I searched with the "xx.xx — xx xxx" pattern on the data to put most of the camp size information into groups of date-size that are easier for Python to read. For camps where the formatting follows a different style, I searched with a different pattern. When only a handful of entries remained, I looped through and entered the numbers manually.
  2. The date the camp opened and when it closed:
    • Most entries include a phrase "Organized xx.xx.xx" and "Closed xx.xx.xx" but some were reopened and closed again. How can we deal with that problem? 
    • I searched with the "Organized xx.xx.xx" and "Closed xx.xx.xx" patterns to get the majority of the operational dates of camps. For camps that reopened at some point, I included an entry in the data for the camps' size that made the size of the camp zero when it was temporarily closed.

Geocoding

The locational information is dirty but good enough for Yandex, which will make mistakes no matter what it is given. In an ideal world we would feed Yandex locational information like you are supposed to write on mail. For example, my work address: Russia, Moscow, Ulitsa Petrovka, 12. Yandex knows exactly what to do with this. This is not an ideal world, though, so we are feeding Yandex the addresses we have. For example, the first camp alphabetically the Automobile-Transport Camp of Dalstroi has the following address with a source citation: Magadanskaia oblast, pos. Miakit {21. l. 667, 733}. But Yandex can sometimes work miracles and if you enter that address, Yandex returns the exact village we need. Other times Yandex will provide an address that is wildly mistaken (see my previous post). What I did is semi-automate the geocoding with the module Geocoder. I looped through the camps, getting Yandex's geocodes. Then I printed the proposed latitude and longitude to myself and plugged the coordinates into Yandex. If they were okay, I could go to the next camp. If they weren't okay, I could try different locations to put into Yandex until I was satisfied. The code looks something like this:

import geocoder
for camp in camps:
    location=camps[camp][location]
    okay='no'
    while okay=='no':
        lat,lng=geocoder.yandex(location).latlng
        print lat+','+lng
        okay=raw_input('Is this okay?')
        if okay=='no':
            location=raw_input('What location to try?')
    camps[camp][lat]=lat
    camps[camp][lng]=lng


Going through all the camps and checking the coordinates was time consuming. Historical addresses are difficult because towns disappear and streets received different names--especially in the former USSR. If a project is too large, it may be impractical to go through coordinates semi-manually as I did. But of course, a project like this is only as good as the geographical data it uses so checking the coordinates is not time wasted.

Extrapolating

Some statistical sets may have a data point for each time increment you want to map. That is not the case for the gulag. The data are sporadic. Some camps have a few points per year and others have one for their whole existence. I had to think about how to handle the time increments. Did I want to print the map yearly? Monthly? Weekly? Daily? Ultimately, it made the most sense to print monthly since there was too little change if I incremented on a weekly or daily basis and there was too much change in a data set with yearly increments.

Choosing a monthly increment raised the problem of extrapolating for camps that existed during a given month but where Memorial offered no specific data. I explained in the previous post that my approach was to take an average of the nearest point before and after where there was data. For example, if a camp existed from 01.1930 to 01.1932 and I had data points for 01.1930 (500 prisoners) and 01.1931 (1000 prisoners), for 02.1930 to 12.1930, I would take the average of the two data points (750 prisoners). In the months after 01.1931, I would keep the same number of prisoners as in 01.1931 until the camp closed. There are probably more sophisticated ways of doing this. I could have weighted so that 12.1930 was closer to the figure from 01.1931. Or I could have been more of an interventionist, adjusting my formula so that it favored lower figures for periods when I knew the gulag population was lower (e.g., during the war and after Stalin's death). I would be more comfortable with the former approach (more objective) than the latter (subjective). In any event, there is no way that the Memorial data would ever give the real number of gulag prisoners and I was satisfied that the data I got approximated the general movement of the gulag population.

The important question becomes how to get this data with a script or manually. I wrote a function in Python that returned a camp's population where there was a data point or an extrapolation in the case where the Memorial data had none. But it would also be possible to pull the data into a spreadsheet at this point and use Excel formulas to fill in the rest.


Outputting

Once the data are cleaned and geocoded, you can export from Python to a .csv file. Or if you are using Excel, you can save to a .csv or .txt file. Excel can do some funky things with Cyrillic so it could make sense to copy and paste the entire spreadsheet from Excel into a programming text editor like Notepad++ (with encoding set to a common standard like UTF8), which would created tab-delimited format where the tab separates the categories. How I did it was to loop through all the camps in each month of each year, running a function that checks that the camp was open. If it was open, the script gets the size of the camp. If the camp had any prisoners, the script writes the date, the camp, its latitude, longitude and size to a file using semi-colons as delimiters. That code looks something like this:

[Edit: I forgot that you will also need the entire prisoner population that Memorial's data gives and the total number of camps. This csv will be needed for the next post. I have added the code below.]

with open('C:\\Your Path...\\gulag.csv','w') as f:
    f.write('date;name;y;x;size')

#ADDED IN EDIT
with open('C:\\Your Path...\\gulagtotals.csv','w') as f:
    f.write('date;camps;total;x;y')
#END EDIT

for year in range(1924,1960):
    for month in range(1,13):
        #ADDED IN EDIT
        campcount=0
        totalsize=0
        #END EDIT
        date=str(1924)+'-'+str(month)
        for camp in camps:
            campdata=camps[camp]
            if campopen(campdata):
                size=getsize(campdata,date)
                if size>0:
                    with open('C:\\Your Path...\\gulag.csv','a') as f:
                        f.write('\n'+';'.join([date,camp,campdata[lat],campdata[lng],str(size]).encode('utf8'))
                    #ADDED IN EDIT
                    campcount+=1
                    totalsize+=size
        with open('C:\\Your Path...\\gulagtotals.csv','a') as f:
            f.write('\n'+date+';'+str(campcount)+';'+str(totalsize)+';'+'15;76')
            #END EDIT


This post should have given a taste of how useful programming can be in generating data for historical research and visualizations. With about fifteen lines of Python code you can download an entire set of web pages and extract the (very, very dirty) data into a spreadsheet. All-in-all it took about ten hours to get, clean and export the data using Python, including trial and error and tea breaks. Cleaning the data was what took the most time by far. If you generated the data manually in a spreadsheet after downloading with Python you are probably looking at about forty hours of work (5 minutes average per camp * 475 camps), which might be reasonable if the data are important for your project but I would rather put the extra hours into learning Python. Various projects by Memorial provide especially good sources to scrape this kind of raw data (here for example, a list of several million victims of Stalinist repression). In the next post, I'll write about how to take this csv and turn it into a map.

Monday, October 19, 2015

Mapping the Gulag over Time

From the 1920s to the end of the 1950s, the Soviet government ran a brutal system of camps that came to be known by its acronym, gulag (Chief Administration of Camps). The gulag has been on my mind lately because I picked up Alan Barenberg's Gulag Town, Company Town (see a series of commentaries on the book by its author and specialists here) and also because the latest issue of the journal Kritika carried a series of articles about the camp system. With all this new information coming out about Soviet prison camps, it struck me that there is an opportunity to produce some digital content as well. I have also been thinking of data sets to use with QGIS, a powerful, open source mapping program, and Soviet forced labor provides a good one in many ways. While there was an entire project dedicated to producing gulag maps, it doesn't really take advantage of all the possibilities the data and technology present. Instead, I created a couple video maps from gulag data.

I'll explain what these visualizations are before analyzing what they mean. Using Python, I took the data from the Russian human rights organization Memorial's project on the gulag. I used only the data from the Camp Administrations (Lagupravleniia) tab, since these include individual camps rather than entire camp systems. These camps were only part of the gulag administration that was itself just part of the Soviet policing apparatus (OGPU-NKVD-MVD). The gulag administration ran a vast prison empire that included colonies for juvenile delinquents, ordinary jails and "special settlements" for exiled dekulakized peasants and supposedly hostile national groups. My visualizations only include what might be considered the "classic gulag," the "correctional labor camps" and transit camps of Solzhenitsyn's Ivan Denisovich or Shalamov's Kolyma Tales.  From the entries of 475 individual camps, I pulled the dates of operation, number of prisoners and geocoded the location. This allowed me to create 432 maps like this:




Then using QGIS and a plug-in called Time Manager, I plotted the points over time on a map, creating an image for each month between January 1924 and December 1959.

The first video contains heat maps showing the density of labor camp prisoners:




The second video contains point maps showing the size of camps:





These videos crystallize much of the new research on the gulag. The works of Barenberg, Wilson Bell (here as well) and others are showing that prisoners had far more contact with the outer world than we previously thought. Prison camps were often near towns, prisoners associated with guards and former prisoners were effectively forced to take up residence in the camp town. Although no one is trying to diminish the brutality of the forced labor system, new research suggests that the archipelago metaphor is not accurate. As economist Tatiana Mikhailova shows, cities formed around the gulag itself. And these maps allow us to see that very dense populations of prisoners were relatively close to cities--even Moscow. Moreover, during the final years of Stalin's reign, to March 1953, camps were everywhere. At the same time, it is worth pointing out that the rather dull first ten seconds of the visualizations give a clue as to why Solzhenitsyn's description endured. The main camp for political prisoners until 1929 was Solovetskii Special Purpose Camp, the island-bound prison north of Leningrad. Although the gulag system changed dramatically after 1929, it is worth remembering that this iconic image of Soviet forced labor--like that of many other aspects of the USSR--comes from the 1920s.

My data do a good job of approximating the total number of gulag prisoners at a time. The data set isn't perfect, of course. Memorial's camp data give the number of prisoners on a non-uniform basis. For example, the Birskii camp in Bashkorostan existed from April 1939 to January 1942. However, it only has five data points for that period. Clearly its population changed more that five times and rather than try to guess what it was in unlisted months, I averaged the number of prisoners in months in between. For February through June 1940, my map gives its population as 12,063, the average of its January 1940 population (12,866) and its July 1940 population (11,261). This approximation is problematic during WWII, when the map displays some camps as existing very close to German-occupied territory. My guess is that the NKVD created or reformed camp administrations in advance of the creation of camps on these territories and that my calculation picked up the first data point after the camp population returned. It also means that some of the camp data from February 1953 reflect the amnesty of prisoners in March 1953.

Rather than presenting the gulag's own summary statistics like Getty, Rittersporn and Zemskov did in this article, I tabulated the total number my camp data gave, warts and all. Nonetheless, it comes awfully close to the summary statistics from that article. More importantly, it captures the trajectory of the major expansions and contractions of the gulag over the period:

1924-1929: Limited camp system
1929-1933: Expansion based on wave of collectivization repression
1935-1939: Expansion based on political and social repression in the Great Terror
1941-1945: Contraction during war as prisoners join army or die during famine conditions
1948-1953: Late Stalinist expansion to largest camp system
March 1953-1956: Contraction during post-Stalin amnesty and destalinization

A final point that these maps hammer home is that the gulag system sent people to the far reaches of the USSR but it also had a huge footprint in European Russia. Punishment and proactive incarceration of "anti-Soviet elements" were the main motives behind mass repression under Stalin. However, the Soviet Union had a labor-hungry economy and construction sites and factories throughout the country demanded laborers. Research like Nick Baron's on forced labor in Karelia or James Harris's on the Urals show the large role that forced labor played in the planned economy. Prisoners were famously the key force in building the Moscow-Volga canal. From this data set, it is clear that the gulag increasingly became a tool of settlement in territories far from the populated European territories. However, it should be equally visible that the gulag remained a labor source for the territories that were already relatively developed.

Those are my thoughts on the videos. I will probably write up a little explanation of how I made the maps because it is not difficult to do if you have time staggered geographical data. If anyone wants to play with the numbers, the totals are available in csv form here and the month-by-month, camp-by-camp csv here. Comments are welcome--especially suggestions for music as backing tracks!

Wednesday, September 30, 2015

Geocoding with Yandex

For the last post I created a map with the birthplaces of Soviet prisoners in Germany based on a German-Russian database. I did this using a Python module called Geocoder. There are three nice features of this module compared to others I have used. First, other modules throw errors if Google (or Yandex etc) cannot find the location. Instead, Geocoder creates an empty object. When processing thousands of locations, not having to restart the script after an error is a big plus. Second, it interfaces with all of the major GIS services (Google, Yandex etc.) and it is easy to code. Third, it somehow gets around using an API key (i.e., registering with Google etc. and remembering the twenty digit encrypted key) in its queries. For novice programmers who need to Geocode lots of places, this is a great module.

The danger of using an automated script to geocode, though, is that Google and Yandex don't know where everything is, and might even give bad results. Companies have different strategies to providing results. Yandex is aggressive about providing coordinates for a query compared to Google. For example, in the last post for Soviet prisoners, I had about 270,000 locations I wanted to find, mostly in the former USSR. I ran a set through Google and Yandex, with the latter pulling results for way more. I assumed that Yandex has better GIS data for the former Soviet Union so I had it do the entire list. It pulled most of the results, something like 250,000. The problem, though, is that Yandex aggressively autocorrects. For a village named Koromenskaia, Yandex assumed I meant Kolomenskaia, a metro station in Moscow. In other cases, Yandex understood Village X, Voronezh Province as Voronezh Province and geocoded to the center of that province.

For a map of monuments uploaded to the memorial cataloging website Pomnite-Nas, I used Google and was fortunate to have only a few hundred that, when placed on a map, were clearly inaccurate. I corrected those few hundred--painstaking work but possible with that number. But with the prisoner map, it was unclear how many listings were inaccurate, if not totally incorrect. With tens of thousands of incorrect listings, it was impossible to know or correct.

I still think Yandex is worth using for people working on post-Soviet republics. For example, I just used Yandex to find a place called Miagit in Magadan province after Google failed. But geocoders should exercise caution when using any of these services or they might find themselvs with false results.

Monday, September 21, 2015

Notes from a Database of Soviet Prisoners of WWII


I have been thinking about POWs and forced laborers lately. My article on digital memory of World War II came out with Memory Studies and in it I analyzed projects like the Russian government's OBD-Memorial, a huge database of Soviet soldiers who died in the war.  I've also been working on a project on the repatriation of Soviet citizens after World War II, including prisoners of war. Part of the challenge of this project is identifying who repatriates were--who was likely to end up as a forced laborer or POW in the war and how did that effect the experience of imprisonment and return to the USSR.

In thinking about this issue, I started looking at available data on prisoners of war. OBD-Memorial hides its data behind a web app, making it impossible to analyze the database. However, I found a database of Soviet prisoners here, run by the Center of Documentation of the Saxony Memorial for the Victims of Political Terror. The database (in Russian or German) includes basic data on each prisoner (name, date of birth, birthplace, nationality, date of death) culled from Wehrmacht documents in former Soviet and German archives. The site says it includes prisoners from "the territory of the former German Reich."  In total, the database includes 881,035 entries, which is a substantial number of the Soviet soldiers taken prisoner. The German estimate is 5-5.6 million and Russian state's estimate is roughly 4.5 million. The difference as I understand it depends on whether non-combatants should be counted as prisoners, since the German army took officials, partisans and civilian men as war prisoners in addition to Red Army soldiers.

In any event, this data is not complete and it is unclear how representative it is of Soviet POWs generally. Among the total population in the database, 50.8 percent (447,642) have a date of death registered and the others presumably survived. This percentage is lower than the overall estimate for the mortality rate of prisoners of German historian Christian Streit (57 percent). Of prisoners listed as Jews, only 33.9 percent died, which is unbelievably low given that Pavel Polian says between 65 and 95 percent of Jews died. (And he believes the higher estimate is correct.)  Another anomaly of the POWs in this database is that just 1,087 of the prisoners are listed as Jewish. Polian says that there were 85,000 Soviet-Jewish POWs total, making them a minimum of 1.5 percent of the total population that was captured, whereas the Jews in this database are just .2 percent. It is possible the database's Jewish population was just those who survived until camps in Germany, which is just about right for Polian's figures. It is worth speculating that those who had somehow survived the initial period of systematic murder in occupied territory might have had a higher likelihood both of survival in labor camps (bringing the mortality rate down) and of hiding that they were Jews (bringing the Jewish population as a percent of the total down).

So it is possible that these prisoners are some (but not all) of those who survived to be interned in Germany. Of course, the very fact that these prisoners made it into the database might indicate that there was something else that made them not a representative sample even within the population of Soviet POWs in Germany. But let's assume for a minute that these figures are somehow representative of the broader population of Soviet prisoners in Germany or perhaps even soldiers overall. What does this data say about who was more likely to live or die in camps? Where were soldiers from?

Nationality: Of the total population, 540,707 had some nationality listed. For each national category, I pulled the total number of prisoners, the number who died in captivity and the number who appear to have survived their captivity because their date of death is not listed. Rather than posting the results here, I uploaded a spreadsheet with the groups that had more than a hundred total prisoners. I calculated the number of prisoners who survived over the number who died for each nationality and the differential from the average. The results are interesting: Russians made up the majority of those captured and survived at an average rate. Ukrainians were the second largest contingent and also survived at an average rate. The nationalities that survived at a disproportionately high rate in the database were Belorussians, Kalmyks, Chechens and Jews.

If we are thinking more broadly about the geography of the Red Army, this database might also have some interesting revelations. Presumably all soldiers were equally likely to be taken prisoner by the Germans  and so the distribution of the soldiers should resemble the Red Army overall. I used the excellent Python module Geocoder (more on this in a separate post) to get the coordinates for most of the locations given as the prisoners' place of birth, about 740,000 of the nearly 900,000 entries. If we average the locations, we end up with a soldier who is from somewhere in the middle of Saratov province, near the border with Kazakhstan. Seems plausible.


Let's look at the map with all the prisoners:



At first glance this also looks pretty good. Soldiers are mostly coming from cities and mostly from the European  parts of the USSR. But if you turn on the point map, some of the points don't make sense. A point on Morskaia street in Lisii nos outside of Petersburg is what Yandex gave for Moraisk, which I presume is a Soviet settlement on the sea. I will write about Yandex's aggressive geocoding in the Geocoder post. In this case, it seems like the map overemphasizes cities as a source of soldiers by associating villages with streets in major cities. Another problem is that when Yandex is given a location like Krivchunka, Kiev Province and can't find Krivchunka, it gives the coordinates as the center of Kiev Province.

In short, the geodata we can pull from the database is pretty flawed. However, in a very crude way it shows that most prisoners were coming from Ukraine, Belorussia and European Russia. This pattern is perhaps an indication of the geography of the Red Army in the first parts of the USSR's war with Germany, when mass encirclements of Soviet soldiers led to huge numbers of prisoners. In any event, the database could surely be of use to someone and I would be very interested to hear more about how the researchers gathered the list of prisoners.

Wednesday, November 5, 2014

Guest Post: Deaf Space in Moscow

My friend Claire Shaw approached me a while ago with some interesting data: the addresses of deaf people in organizations based on archival documents from the Khrushchev period. After she put the addresses in a technologically condusive format, I made a couple maps for her. But these maps were not mere charity. In exchange Claire offered to make a couple thoughts based on these maps for this blog. In the end, she did way more than that and I'm happy to host her guest post "Deaf Space in Moscow."


Deaf Space in Moscow

A couple of years ago, digging through the local files of the All-Russian Society of the Deaf in Moscow’s Central City Archive (TsGA Moskvy, formerly TsAGM), I came across records of a rather heated debate about the place of deaf people in the urban spaces of the Soviet capital. In late 1959, members of the society had been discussing the local subtitled cinema night, lamenting the poor behaviour of deaf people before and after the screenings: audience members would mingle in the streets, sign "loudly" to each other, obstruct traffic and generally make a nuisance of themselves. This discussion sparked off a furious debate about the visibility of deaf people in urban space and the communal policing of deaf behaviour. As one activist put it, "Why are deaf people crowding the streets? For five minutes they run around like mad people. Can that be allowed?"

This small discovery, part a much bigger research project into the history of the Soviet deaf community, represented a jumping-off point for an exploration of the politics and practices of Soviet deaf space, which has culminated in an article forthcoming in Slavic Review in Spring 2015. This guest blogpost, then, is both a shameless plug for my own research, and the opportunity to introduce two brilliant digital maps that have been instrumental in helping me to understand deaf space in the Soviet context. The notion of deaf space has been growing in prominence in recent years, as part of a conceptual arsenal used by deaf studies scholars to explore issues of identity and belonging in deaf communities. Mike Gulliver, whose work has been particularly influential in this field, argues that deaf space is ‘produced’, in the Lefebvrian sense, as a distinct reality, as deaf people gather together as a community and author their "being in the world" through interactions in sign languages. As such, "deaf" spaces are qualitatively different from "hearing" spaces, defined predominantly by the visual experience of the world.

The late Soviet context is a particularly intriguing one through which to explore this concept of deaf space, as it represented the moment in which the Soviet deaf community became more physically – and institutionally – prominent in the Moscow cityscape. In the context of the housing campaigns of the Khrushchev era, the All Russian Society of the Deaf (VOG) began a huge building programme, funded by profits from deaf society workshops, to build industrial, social and living spaces for its members. Thousands of square metres of deaf spaces were built across the country, creating new institutional buildings in which members of VOG could come together to work, relax, learn and live. When you take into account the fact that, by 1949, 96% of all deaf people in the capital were VOG members, then it is possible to view Soviet deaf space as a much more institutionalised and totalizing phenomenon than that found in other historical contexts.

From existing archive sources and cultural artefacts, I had formulated a few hypotheses about the way in which this Soviet deaf space was experienced, both by the deaf and the hearing. On the one hand, the creation of institutionalised deaf spaces enabled the development of a strong deaf community identity, built around particular "hubs" such as the VOG House of Culture on Sretenskii Tupik, the nearby Theatre of Sign and Gesture, or VOG Industrial Workshop No. 1 in Tekstil’shchiki. On the other, the gathering of deaf people together in these locations made them more visible in the eyes of the hearing, and created concerns on the part of both hearing and deaf people about the behaviour and identity of deaf people. Since the revolution, deaf people had been working to transform themselves into New Soviet (Deaf) People, becoming Stakhanovites and shock workers in industry, learning political literacy and demonstrating their kul’turnost’ through theatre and art. Yet the disruptive visibility of deaf people in urban space, particularly their visible use of sign language, threatened this façade of Soviet socialisation, a particularly problematic issue in the context of new models of the self and the communal policing of behaviour in the Khrushchev era.



Such hypotheses, of course, rested on the assumption that there was such a thing as Soviet deaf space, and that its locations could be mapped and studied. The two maps here show us that this was indeed the case, making use of archival data to map the locations of deaf space in Khrushchev-era Moscow. One of the files located in the Moscow City Archive contained handwritten records from 1963 of the location and membership of all VOG primary organisations, Red Corners and clubs in the city; the social spaces, often located in workplaces, which became a "home from home" for Soviet deaf citizens. These records have become two maps. The first, a point map, records the locations and membership figures for these local primary organisations. The second, a heat map, is more exciting for my purposes, showing the concentrations of deaf people within the Moscow cityscape. It highlights particularly populous deaf spaces, including the regions around Moscow Special Schools No. 101 (for the deaf) and 30 (for the hard of hearing), the deaf industrial workshops of the south east, the deaf cultural locations around the VOG Moscow House of Culture in the Sretenskii region, and the concentration of deaf pensioners in the north east—a phenomenon I have found difficult to demonstrate until now.  


In my research, I try to use this conception of deaf space to ask bigger questions about marginality and inclusion within the Soviet body politic, and to complicate some of the existing narratives about top-down, exclusionary practices in the late Soviet era. These maps show us that deaf communities and spaces were far from ghettoized, and that they were mostly to be found in symbolically "Soviet"locations, such as the school, the factory, the club and the theatre. Yet as I trace in my work, the insistence of deaf people that they must remain together in institutionalised deaf spaces in order to "achieve" Sovietness and "live" socialism could not help but mark them out as a community apart. The ramifications of this exclusion are still being felt today.

Claire Shaw
University of Bristol, UK.