Wednesday, September 30, 2015

Geocoding with Yandex

For the last post I created a map with the birthplaces of Soviet prisoners in Germany based on a German-Russian database. I did this using a Python module called Geocoder. There are three nice features of this module compared to others I have used. First, other modules throw errors if Google (or Yandex etc) cannot find the location. Instead, Geocoder creates an empty object. When processing thousands of locations, not having to restart the script after an error is a big plus. Second, it interfaces with all of the major GIS services (Google, Yandex etc.) and it is easy to code. Third, it somehow gets around using an API key (i.e., registering with Google etc. and remembering the twenty digit encrypted key) in its queries. For novice programmers who need to Geocode lots of places, this is a great module.

The danger of using an automated script to geocode, though, is that Google and Yandex don't know where everything is, and might even give bad results. Companies have different strategies to providing results. Yandex is aggressive about providing coordinates for a query compared to Google. For example, in the last post for Soviet prisoners, I had about 270,000 locations I wanted to find, mostly in the former USSR. I ran a set through Google and Yandex, with the latter pulling results for way more. I assumed that Yandex has better GIS data for the former Soviet Union so I had it do the entire list. It pulled most of the results, something like 250,000. The problem, though, is that Yandex aggressively autocorrects. For a village named Koromenskaia, Yandex assumed I meant Kolomenskaia, a metro station in Moscow. In other cases, Yandex understood Village X, Voronezh Province as Voronezh Province and geocoded to the center of that province.

For a map of monuments uploaded to the memorial cataloging website Pomnite-Nas, I used Google and was fortunate to have only a few hundred that, when placed on a map, were clearly inaccurate. I corrected those few hundred--painstaking work but possible with that number. But with the prisoner map, it was unclear how many listings were inaccurate, if not totally incorrect. With tens of thousands of incorrect listings, it was impossible to know or correct.

I still think Yandex is worth using for people working on post-Soviet republics. For example, I just used Yandex to find a place called Miagit in Magadan province after Google failed. But geocoders should exercise caution when using any of these services or they might find themselvs with false results.

Monday, September 21, 2015

Notes from a Database of Soviet Prisoners of WWII


I have been thinking about POWs and forced laborers lately. My article on digital memory of World War II came out with Memory Studies and in it I analyzed projects like the Russian government's OBD-Memorial, a huge database of Soviet soldiers who died in the war.  I've also been working on a project on the repatriation of Soviet citizens after World War II, including prisoners of war. Part of the challenge of this project is identifying who repatriates were--who was likely to end up as a forced laborer or POW in the war and how did that effect the experience of imprisonment and return to the USSR.

In thinking about this issue, I started looking at available data on prisoners of war. OBD-Memorial hides its data behind a web app, making it impossible to analyze the database. However, I found a database of Soviet prisoners here, run by the Center of Documentation of the Saxony Memorial for the Victims of Political Terror. The database (in Russian or German) includes basic data on each prisoner (name, date of birth, birthplace, nationality, date of death) culled from Wehrmacht documents in former Soviet and German archives. The site says it includes prisoners from "the territory of the former German Reich."  In total, the database includes 881,035 entries, which is a substantial number of the Soviet soldiers taken prisoner. The German estimate is 5-5.6 million and Russian state's estimate is roughly 4.5 million. The difference as I understand it depends on whether non-combatants should be counted as prisoners, since the German army took officials, partisans and civilian men as war prisoners in addition to Red Army soldiers.

In any event, this data is not complete and it is unclear how representative it is of Soviet POWs generally. Among the total population in the database, 50.8 percent (447,642) have a date of death registered and the others presumably survived. This percentage is lower than the overall estimate for the mortality rate of prisoners of German historian Christian Streit (57 percent). Of prisoners listed as Jews, only 33.9 percent died, which is unbelievably low given that Pavel Polian says between 65 and 95 percent of Jews died. (And he believes the higher estimate is correct.)  Another anomaly of the POWs in this database is that just 1,087 of the prisoners are listed as Jewish. Polian says that there were 85,000 Soviet-Jewish POWs total, making them a minimum of 1.5 percent of the total population that was captured, whereas the Jews in this database are just .2 percent. It is possible the database's Jewish population was just those who survived until camps in Germany, which is just about right for Polian's figures. It is worth speculating that those who had somehow survived the initial period of systematic murder in occupied territory might have had a higher likelihood both of survival in labor camps (bringing the mortality rate down) and of hiding that they were Jews (bringing the Jewish population as a percent of the total down).

So it is possible that these prisoners are some (but not all) of those who survived to be interned in Germany. Of course, the very fact that these prisoners made it into the database might indicate that there was something else that made them not a representative sample even within the population of Soviet POWs in Germany. But let's assume for a minute that these figures are somehow representative of the broader population of Soviet prisoners in Germany or perhaps even soldiers overall. What does this data say about who was more likely to live or die in camps? Where were soldiers from?

Nationality: Of the total population, 540,707 had some nationality listed. For each national category, I pulled the total number of prisoners, the number who died in captivity and the number who appear to have survived their captivity because their date of death is not listed. Rather than posting the results here, I uploaded a spreadsheet with the groups that had more than a hundred total prisoners. I calculated the number of prisoners who survived over the number who died for each nationality and the differential from the average. The results are interesting: Russians made up the majority of those captured and survived at an average rate. Ukrainians were the second largest contingent and also survived at an average rate. The nationalities that survived at a disproportionately high rate in the database were Belorussians, Kalmyks, Chechens and Jews.

If we are thinking more broadly about the geography of the Red Army, this database might also have some interesting revelations. Presumably all soldiers were equally likely to be taken prisoner by the Germans  and so the distribution of the soldiers should resemble the Red Army overall. I used the excellent Python module Geocoder (more on this in a separate post) to get the coordinates for most of the locations given as the prisoners' place of birth, about 740,000 of the nearly 900,000 entries. If we average the locations, we end up with a soldier who is from somewhere in the middle of Saratov province, near the border with Kazakhstan. Seems plausible.


Let's look at the map with all the prisoners:



At first glance this also looks pretty good. Soldiers are mostly coming from cities and mostly from the European  parts of the USSR. But if you turn on the point map, some of the points don't make sense. A point on Morskaia street in Lisii nos outside of Petersburg is what Yandex gave for Moraisk, which I presume is a Soviet settlement on the sea. I will write about Yandex's aggressive geocoding in the Geocoder post. In this case, it seems like the map overemphasizes cities as a source of soldiers by associating villages with streets in major cities. Another problem is that when Yandex is given a location like Krivchunka, Kiev Province and can't find Krivchunka, it gives the coordinates as the center of Kiev Province.

In short, the geodata we can pull from the database is pretty flawed. However, in a very crude way it shows that most prisoners were coming from Ukraine, Belorussia and European Russia. This pattern is perhaps an indication of the geography of the Red Army in the first parts of the USSR's war with Germany, when mass encirclements of Soviet soldiers led to huge numbers of prisoners. In any event, the database could surely be of use to someone and I would be very interested to hear more about how the researchers gathered the list of prisoners.