Monday, August 26, 2013

Who cares about Stalinist repression? Commemorative databases and regional historical memory

Among of the key movements that came out of the Gorbachev period were those dedicated to remembering the victims of Stalinist repression.  Although organizations like Memorial or the Sakharov Center for Human Rights have since moved to advocate generally for human rights, one of their main functions remains commemoration of the victims of Stalinism, especially by collecting and publishing lists of victims.  Memorial has been particularly active in this regard, putting together a database of almost three million names.  But its list of victims (available here) was for the most part not compiled by the central Memorial organization originally.  Instead, it was researched by regional affiliates and other local groups.  What I am doing today is looking at what regions have been finding these names as a rough assessment of where interest continues in commemorating Stalinist repression in Russia today.

Memorial's database includes not only people who were repressed during the Great Terror (1937-38) but also those exiled during collectivization or deported as members of the so-called "punished peoples" during World War II.  It is an incredible resource.  And Memorial's data (from this database and others) has been used for mapping the sites of the Stalinist repression pretty extensively at  For every entry, Memorial gives as much information as it has about the victim, including name, birthplace, date of arrest or exile and so on.  But for this post, I am really interested in the source of the information itself--who collected the data on these people.

Memorial compiled this list on a decentralized basis.  Each regional affiliate of Memorial or local historical society collected the names of victims and published these names as "kniga pamiati" or a "commemorative book."  And yet not every region contributes equal numbers of victims and some regions contribute none at all.  The introduction to the project explains some of the motivations and limitations of the project in the regions:

"In Russia, the process of collecting and publishing regional commemorative books remains the affair of the regions themselves.  The country has no state program to memorialize the victims of political repression.  There are no normative acts governing the preparation and publication of commemorative books, nor a standardized methodology or criteria for the collection of this data.  Therefore the preparation of the books is varies.  In some places the books are prepared and published by the local administration or various institutions involved in one way or another with [formal legal] rehabilitation..., elsewhere by academic and cultural associations, and elsewhere they are published through the efforts of society with minimal or no support from regional authorities."

Of course, the collection and publication of this data does not correspond perfectly to the interest in commemorating the victims of Stalinism, and that there are other factors involved.  However, I am working from the assumption that there is a general correlation between the number of victims a territory commemorates and the interest in commemoration in those territories.  I think this would generally hold true even in territories where the administration is less amenable to the project.  For example, in regions where official financing for commemorative projects has been minimal, it is likely that the administration is not especially interested in facilitating the release of victims' names but also that the population is not interested in lobbying for these projects.

Using Python, I counted the number of listings for each collection and added collections from the same province together.  Most were easy to assign to a province, although a couple stumped me.  (Does anyone know who put out the databases Pol'skie zakliuchennye vorkutinskikh lagerei or Pol'skie spetspereselentsy v Arkhangel'skoi obl.? They account for almost all of the 67,000 entries I couldn't tie to a region.)  In total, I counted 2,644,774 names from regional historical associations with the other 300,000 published in Belarus, Kazakhstan, Kyrgyzstan, Ukraine and Uzbekistan.  With the data geocoded, I put it on a density map:

It's awfully small here with the provincial breakdown so I would recommend going to the bigger map, especially because I included the data and key for the geocodes there.  (A limitation of Google's Geochart when working with provinces is that it demands the International Organization for Standardization's code rather than the proper name of a province.)  What I expected when I thought of running this test was that Moscow and Petersburg would be the two big hot spots whose commemorative organizations would publish the names of lots of victims.  This result made sense to me because those cities have more resources than other areas and my impression that there is a larger presence of anti-Stalinist cultural elites (e.g., in Memorial or the Sakharov Center).  And the shear size of the population is bigger in Moscow and Petersburg than elsewhere, although I planned to account for that by normalizing the number of victims per capita as of the 2010 Russian census. Yet the regions putting out the largest number of names (and even more so on a per capita basis) were not Moscow and Petersburg but rather places like Tomsk, Komi and Chechnia.  In general, they have one or both of two qualities:
  1. Autonomous ethnic republics and territories
  2. Territories with large (and maybe just as important, infamous) Gulag camp complexes (see the map from 1931-1941 from below)

I was surprised by this result at first but I think it makes sense in a lot of ways.  The ethnic republics where the most names of victims of Stalinism have been published are naturally the republics from which large numbers of the titular nationality were deported.  The two leaders are Chechnia and Kalmykia, whose titular nationalities were expelled during World War II.  And of course the contemporary troubles in the Caucasus also contribute to its being of special interest.  But even places like Tatarstan and Bashkortostan, where the titular nationalities were not systematically targeted with repression as far as I know, the numbers listed in their commemorative books are relatively high.  Of the eighty-one territorial units, they ranked fourth and sixth in absolute numbers and fourteenth and nineteenth in per capita numbers, respectively.  I would suggest two explanations for the high numbers in the autonomous republics.  The first is that Stalinist repression has become a touchstone for differentiating a republic's identity from the rest of Russia.  The second is that most of the Caucasians exiled during WWII from these territories went back once freed.

In provinces with high numbers and per capita proportions, connections between Memorial and the police seem to be playing a crucial role in names being published.  In Tomsk the Administration of Internal Affairs contributed 216,926 (!) names of the 240,256 total names listed from that territory.  I tried tracking down information on the connection there but found nothing concrete about it.  My feeling is that in general and in this particular case, it probably reflects the intensity of the efforts in the region to gain legal restitution for the politically repressed.  These efforts are related to the activity of local Memorial affiliates and similar organizations.  (The Tomsk Memorial website is great, by the way, and shows the dedication of that chapter.) [Update 6/2021: This site is now, unfortunately, off the internet. My amateur internet sleuthing suggests that the ownership of the URL lapsed and it was bought by a company trying to sell it back to the original owners. Who knows? A version of the site from 2013 is available at Internet Archive here.]  But they also reflect that many people sent to those areas remained there after their release, whether by choice or compulsion.

What is most interesting to me about these results is that they seem to reflect the site of settlement after repression and not the site of arrest.  In a sense, it makes a nice companion piece to the map of the Great Terror in Moscow that I posted earlier.  That data set allowed me to ask what areas were hit hardest by the terror.  In the bigger list of victims lets me ask what areas still remember the the terror and other Stalinist repressive campaigns.  It makes sense that those areas that have the strongest efforts to commemorate the victims of Stalinist repression are those where the largest number of affected people live today.  It reminds us of the impact those policies had and continue to have on the families of people repressed under Stalin and the regions where they live.

Tuesday, August 20, 2013

Amateur Demography: Human Sex Ratio Edition

I have been toying around with a little Python module that I wrote to turn a spreadsheet with data over time into a density map by region using Google Geochart.  It also codes in the ability to change the map for different data sets with JavaScript.  This is a little dangerous since I found tons of demographic data at the website of the Institute of Demography at the Higher School of Economics. (Self promotion alert: I will be starting a postdoc in history at HSE next month.)  It has tons (thousands) of tables from every census conducted in Russia and the Soviet Union, which makes it a great candidate for mapping.

Ideally, a map will tell a little story and unfortunately some of the data sets don't tell very interesting ones.  The variation over time is minimal and expected.  But there are other data sets that turn into really nice visuals.  Here is one of them--the proportion of men to women in the USSR by republic from 1926 to 1989 (change the category with the dropdown menu):

In the map, fewer men means lighter coloring and a very high ratio of men to women would display as dark red.  I set the scale to start close to the lowest value (.782 men to women in 1959 in Estonia) and the high end near the highest value (1.2 men to women in Turkmenistan in 1926).  

The obvious change I expected was in the first post-war census (1959).  The war was hard on the entire population but it absolutely destroyed the cohort of men who were of ordinary service age, especially at the start of the war.  Catherine Merridale in Ivan's War cited a almost unimaginable statistic that 90 percent of men born in 1921 died in the war.  And you can really see this by comparing the map from 1939 and 1959.  The entire Soviet Union bleeds out in the 1959 census but the contrast is especially strong in the areas where the war was fought.

One other thing popped out at me - the contrast of Russia, Ukraine and Belorussia with the Central Asian republics. (Also, Kazakhstan and Kyrgyzstan were autonomous republics of the Russian Soviet Republic until 1936 and so they do not show up in the 1926 census.  Same with the Baltics and Moldova until 1959.)   I don't know why the ratio of men to women was comparatively (and sometimes actually) high in Central Asia.  The possibility that came to mind was that there was under-reporting of the female population that was part of the general disenfranchisement of the female population in those countries.  Over time, the difference between the republic began to even out, which could suggest that programs to include women in the social-politic culture of Central Asia had some effect (or at least made them legible as citizens to state authorities).  If there are any Central Asianists who have thoughts on this I would be interested in a more informed opinion.  

More of these kinds of maps forthcoming in the next few weeks.

Wednesday, August 14, 2013

Mapping the Great Terror in Moscow

Last week I found out about some data the Russian NGO Memorial compiled of the home addresses of people executed by the NKVD during 1937-38 (the Great Terror) in Moscow.  The original project is available here.  They compiled about 5,500 locations with, by my count, about 12,100 people.  It's pretty terrible stuff.  But it is also an amazing source and something that should be mapped.  And actually I poked around a bit (read: I checked a small link at the bottom of the project's page) and found that someone (Dmitriy Skougarevskiybeat me to it.  His map is good but his map was made ages ago (2011!) so it could use a bit of an update.  He was kind enough to publish the data he used in a Google Fusion Table here. [Update 6/2021: I am migrating all my maps from the beloved and discontinued Fusion Tables to the acceptable ArcGis online. Here is the new version. And here is the heat map version.]  (My table is here, by the way.)  Here is the map I came up with:

I improved on Skougarevskiy's map in two ways: First, his database was missing about 700 addresses that didn't have location data and I was able to find data for most of those (all but nine buildings, where ten executed people lived).  The new map is a mostly complete representation of the data. Second, I wanted to include a way to show not only the points but the number of people arrested, so I used a heat map to visualize it.  Unfortunately, the heat map is not great.  If you zoom in too far, the intensity dissipates and the coloring could be better.  These are limitations of using the very convenient Google applications instead of running the data through your own, nonexistent server.  (Info on the Google Maps API is available here.)  

So what is this data?  Memorial's introduction to the project is useful for understanding the period and what it means for scope and limitations of the data (my translation):

"This list is ordered...not by the place of burial but place of residence: from the addresses from which they [the victims] were taken to die.  Many readers will find here not only their own street, but their house and possible even their apartment...

"The list includes just under twelve thousand people [more, actually SB], surely not a full count, not only because we don't have information about several thousand Muscovites who were killed and then rehabilitated [politically].

"The difficulty is also that in many of the investigative files in the archives no address is shown, or else it is so illegible that it is impossible to connect it to the topography of Moscow.  Almost no one from the list are from those regions of Moscow that were incorporated after 1960 (including Perovo, Liublino, Babushkin, Tsaritsyno and so on).  During the years of repression they were separate settlements with entirely different planning and finding their pre-war location today is almost impossible.  Additionally, thousands of people from city suburbs and other regions of the USSR were executed in Moscow.  For these reasons, this list includes just a third of the larger number of people executed in Moscow for political crimes."

Of course, it goes without saying that this group of people was just a fraction of those executed during 1937-38 overall as well.  Since 1991, historians have learned that in those two years around 700,000 people were executed and 1.5 million arrested by the NKVD at the orders of party leaders, Stalin above all.  Many of these victims were not convicted on the trumped up charges that we might associate with political crime or intellectual opposition to the regime (e.g., as in novels like 1984 or Darkness at Noon).  Instead, they were arrested for the political crime of having been exiled during collectivization as a rich peasant, having been arrested for petty crime too many times or having an ethnic background associated with a bordering state like Poles and Germans.

It is tempting to read hot-spots as buildings where the NKVD projected conspiratorial organizations onto ordinary social networks but it is also possible to read those hot-spots as buildings where lots of people lived.  It does seem like there is a greater intensity of executions in the center of Moscow, where high-ranking party members (hit hard during the terror), would have been more likely to reside.

There are one or two interesting moments:  The famous House on the Embankment (Ul. Serafimovicha d. 2) shows up as bright red, even zoomed in, because 242 people who lived there were executed.  Then there is another area with lots of arrests near metro station Mayakovskaya.  And clicking on point map from there along Tverskaya reveals many buildings where multiple executed people lived.  Perhaps someone with a better knowledge of Moscow's geography in the 1930s can figure has something more to say about the pattern of arrests.

If I have time, I may put out some more maps based on this data to look for other patterns.  Memorial posted a lot of information in the database that Skougarevskiy didn't include in his table. I think it would be particularly neat if I could run through the nationality or place of birth data or party membership from the original data Memorial provides.  Given what is now known about the mass operations in 1937-38, it wouldn't surprise me to see Poland and the Soviet borderlands (pre-war) be a hot spot.  Other thoughts about what might be interesting ways to visualize these records?

Six Degrees of Soviet Cinema

I have my version of the Oracle of Bacon for Soviet cinema close to finished on my computer and will post a link to it here for other Soviet cinema nerds to play with once I can figure out remote hosting. [Edit: It is now available at here: Six Degrees of Soviet Cinema. Second Edit: It was throwing a server error and it was too much work to fix so I deleted it.]  In the meantime, I have all the data for Soviet cinema ready.  At the end I will reveal the Soviet Kevin Bacon, which is surely why most are reading this.

Here are the basics about my sources: I derived the database with Python from the website  It lists has 7,866 films listed and in these films it comes up with 52,703 people who were actors, directors, screenwriters, composers, art directors, camera operators or producers.  (It seems like a handful (~10) of these entries are the English language translations of the film title itself that my parser picked up as being actors.  This isn't ideal but not enough to throw the centrality measures off significantly.)  The collection seems like a surprisingly complete set of Soviet films.  If there are missing films I had trouble finding them.  For example, I looked for and found the obscure, lost 1932 film The Guy from the Missouri River, a film about a fictional agricultural commune based on the Seattle (American) Commune that I have done research on.  A large number of the entries are for films that IMDB doesn't have.

There are ways that the database is incomplete, though.  IMDB does a great job of including every person who worked on a film and does not, so many people--especially crew--are not included. On the flip side, it could be argued that some films and actors should not be in the database at all.  The Blue Bird a Lenfilm/Twentieth Century Fox fantasy film co-production from 1976 is one example (and the reason that Elizabeth Taylor (Elizabet Teilor) and Jane Fonda (Dzhein Fonda) appear in the database).  It is also might make sense to have banned films count differently, although they are included for technical and intellectually justifiable reasons because they were part of the Soviet film industry.  The database includes some television shows. (Only mini-series programs, it seems, but I could be wrong.)  It does not include documentary films, which is a hard pill to swallow because it necessarily meant the exclusion of significant figures like Dziga Vertov.  I included a video below as a form of apology.  However, it was necessary because it would have introduced a large number of "actors" into the database who were actually the subjects of these films and not involved in their production.  But all in all, it seems like the listings for fiction films are as complete as can be found without digitizing Soviet film catalogs by hand.  (Of course, if there are any corrections for individual films or if there is a more complete data set I would appreciate a heads up.)

I had two goals when putting this database together.  The first was to make something similar to Oracle of Bacon that would be a little toy for Soviet cinema buffs to test out.  But I also wanted to use graph theory to assess the structure of Soviet cinema as a professional network from a quantitative perspective.  The main metric I think is helpful for understanding the network is centrality, the average of the distances from any one person to any other person in their network.  I calculated the centrality two ways.  The gold standard classic of centrality calculations (the Coca-Cola if you will) is degrees of separation between people.  In this network representation it doesn't matter if two people have done one movie together or a hundred movies together; any connection is a connection.  The other calculation is the centrality based on weighted distance (Pepsi?). This representation creates connections count more or less based on the number of films two people worked on together (representing a stronger connection).  The first I included because it is more intuitive (if someone's centrality is two, it means that they can get to a random figure in the network in two steps).  The second I included because I think it more accurately represents how professional networks operate (take a look at my last post for information on this).  And remember that a lower number is more central because it means less distance from that person to get to any other person on average.

Even before I did any of these calculations, though, the main thing that jumped out at me was just how connected the world of Soviet cinema was. Of the 52,703 entries in the database, only ten cannot be connected to the main the network.  (The two casts from films from the interwar period Zasukha (1932) and Beloe zoloto (1929) apparently only worked on those films, respectively.) The rest--more than 52,000 people--can all be connected to each other through one film or another in six or fewer steps.  In fact, the majority of figures (37,141) can reach any other figure in that network in four steps and most of the rest (15,467) can reach any other figure in five steps.  There are thirty-five super connected figures who can get to anyone in three or fewer steps.  In other words--and this may seem obvious--the world of Soviet cinema was very small.

There are other things this graph can tell us that are less obvious.  For example, how connected was the average Soviet film worker and what can that tell us about Soviet professional networks more generally? I made a chart that aggregated the centrality of the people in the network into quarter steps.

The average was 2.76, meaning it took the average person in the Soviet cinema network can be connected to any other figure in between two and (more likely) three steps on average. A majority of people in the network fall right into that mean.  Then there is group of about 15 percent of the network who could reach anyone else in between two and two and a half steps.  And then there is a very small elite group who can get to anyone in about two steps.  (If I broke it down further, it would show that those under two steps are really averaging about 1.97 steps.)  If this was an accurate representation of the professional network of Soviet cinema, it would seem to suggest that there were some very well connected people in Soviet cinema, many people who had average connections and only a few who were very poorly connected.

There is another way of measuring centrality, which is to weight the network. It make sense to account for the strength of someone's connection to others in networks like film where someone (Tim Burton) might work dozens of times with one or two highly central people (Johnny Depp) and that person's centrality to the network does not register as much as it should.

I don't think this picture even has all the Burton/Depp collaborations.

This strength of connection can be measured mathematically in various ways.  The basic way is to count the number of times two people from a network are connected and make the cost of their connection equal to one/count of connections.  So if person1 was in four movies with person2 who was in two movies with person3 the path from person1 to person3 would be .25 + .50 = .75.  In a second calculation of centrality, I used this way of calculating shortest weighted path.  Here is the chart of the aggregated weighted path centralities by quarter steps:

It turns out that a couple differences from the unweighted chart are interesting and, based on what I know about Soviet cinema, are probably more representative of the professional network of Soviet cinema.  This data divides film personnel more clearly into two groups.  There is a group of film personnel who have are quite central (to about 1.6 on this scale), a dip and then the majority of personnel clumped around the mean. (And the dip around 1.6 would be bigger if I broke the data into tenths rather than quarters.) Compared to the unweighted data, it would suggest that there was a two-tiered hierarchy in Soviet cinema.  The comparison of the two data sets (using their standard deviation, since the average weighted path is shorter) makes it clearer:

Why do I think the weighted calculation represents the professional world of Soviet cinema better than the unweighted calculation?  I analyzed the issue in my last post, but in general I think that a weighted representation in a social or professional network makes more sense, since collaboration on a larger number of projects usually reflects stronger personal/professional connections than collaboration on fewer.  But in the specific case of this cinema database and the reality of Soviet cinema, I also think there are conditions that make a weighted calculation more appropriate.  My anecdotal observation from looking through the database is that many people in Soviet cinema did one or two films (as represented by the big bulge of people in the middle of the weighted data) and then moved on to whatever else. (And some of those people are like "Dzhein Fonda," who obviously has done many films but only one that registers in this database.)  But a small-medium size group of people in the database (12,000-15,000) did many, many films and therefore was far more plugged into the Soviet film network.  My interpretation based on these figures are that these people represent the core of the film industry in the USSR.  (But only represent, since the entire crew of films is not included in the database.)

There are other interpretations that I think also might explain (or contribute) to this distribution of centrality.  One that struck me as very plausible was that the explosion of media in the post-Stalin period ("Moscow Prime-Time" as Kristen Roth-Ey puts it in her book), meant that figures who worked in that period were naturally going to rate as more central overall.  I had problems thinking of a really famous Stalin-era actor but we can take Mikheil (Mikhail) Gelovani--the actor who played Stalin and was therefore in many films until 1953.  Gelovani ranks just 2501th overall in the weighted calculation (about fifth percentile) and 4873rd in the unweighted calculation (about tenth percentile).  In contrast, Aleksandr Dem'ianenko--the star of Leonid Gaidai's Shurik movies and a fixture of post-war film--ranks 27th and 15th, in the first percentile for both.

Gelovani as Stalin in The Fall of Berlin, one of his best known roles

Similarly, Sergei Eisenstein ranks lower than Gaidai, even though they worked on a similar number of films.  Eisenstein is in the top third and top half of directors by weighted and unweighted centrality whereas Gaidai is in the top percentile and top eighth percentile in the same rankings.  Even though Eisenstein and Gaidai worked on similar numbers of films, Gaidai measures as way more central because comparative size of the industry during his era.  That said, I think that it just takes figures from before the 1950s a hop or two to get out of their era.  That just pushes them to the back of the core professional cinema group but not out of it entirely.   More on the difference in the film industry in different eras in another post because my computer is currently chewing on the networks from each of the different eras.

(I'd also like to note here that the weighted-Gaidai metric proves that Leonid Gaidai is basically the Soviet Judd Apatow.  Directors/producers who always work with the same people don't register as being as central in the unweighted calculation but the strong connections to collaborators show up in the weighted stats.  Here is a video of the best of Gaidai.  How many times do Dem'ianenko or Iurii Nikulin or Georgii Vitsin or other familiar faces appear in it? Answer: a million times.)

Best of Gaidai according to someone on YouTube

Besides the different eras, I had considered the possibility that the distribution was being affected by the different jobs of film workers.  I assumed that actors would be more central on average (a lower number) because they can work on many projects while directors and other non-acting personnel have to invest more time in individual projects.  This result would have suggested that the director was not the center of the Soviet film universe.  And, as every book about the history of cinema that uses qualitative measures shows, it would have been inaccurate.  Directors were clearly the people at the reins of the Soviet film industry. However, the data actually back up the the traditional interpretation that puts the director at the center of Soviet film.  Take a look at the average centrality by job:

I had to think a little about why directors came out on top, because all the top figures are actors (except Gaidai, but even he is only in the top 200 in the weighted calculation).  So what is going on?  According to my database there were a small number of actors who ranked very highly (less than .01% in either calculation) but the majority of actors rated as much less central than workers in other tasks in Soviet film.  For example, about 43 percent of directors have weighted centrality ratings between .75 and 1.5 but only about 15 percent of actors are in those categories.  In fact, actors on average have the worst centrality rankings, except for producers--which is a small category that seems to have been imported in the late 1980s.    Take a look at the distributions of the different professions' centrality in the network by percentage (I just included the weighted.  If anyone wants the unweighted centrality I can post that as well but it is similar.):

Of course, normalizing the data by using percentages belies the overwhelming number of actors (40,000+) in the database versus the other types of workers.  The reason for this large number is that almost every movie has three times or more the credited cast as it has credited crew (according to  I don't know that this contributed to the higher average centrality of the crew positions.  It might decrease their average centrality if every crew member was included for every film.  But including more credits might also make some crew who are listed but undercredited more central.  What I think it really shows is that you didn't need to be a professional actor to be in a movie.  Even architectural historian Vladimir Papernyi is in the database for a bit part he had! (He was in the film Leap Year (1961).  His rank: somewhere in the 29,000s, not part of the core professional cinema group. ) Check out his profile here.  So if many people could do some acting, not every person off the street could be a director, art director (artist) or camera operator.  

There is also some overlap between the categories, especially director and actor, and that seems to have added a few highly central people to the ranks of the directors.  The database counts anyone who has been in any of the positions toward each of those positions.  That means that someone like Aleksei Alekseev, who did a lot of work both on screen and as a voice actor for dubbed films, gets to be a director because he was the sound director on the Soviet-Italian film Life is Beautiful.  He is the second most central director but based almost exclusively on his acting career rather his scant directing credentials.  There are more legitimate borderline cases, like Aleksei Batalov, who was one of the most famous Soviet actors but who also directed three films and is credited as the screenwriter of four Soviet-era films.  I felt uneasy arbitrarily deciding who had done what and just used's classification.  (Who am I to say if Clint Eastwood is an actor, director or lunatic chair-shaman?)

Aleksei Batalov as the ultimate specimen of 
Soviet masculinity in Moscow Doesn't Believe in Tears

Finally if you were wondering--Aleksei Batalov was not the Soviet Kevin Bacon.  And really, I should say he is not the number one central person, which in IMDB is actually Harvey Keitel.  I think the answer will be quite obscure.  According to the unweighted graph, it is Iurii Sarantsev:

Iurii Sarantsev

Sarantsev fits the bill, though, in a lot of ways.  He came up as a young actor just at the right time in Soviet cinema, in the early to mid 1950s, so that he could take part in the post-Stalin media explosion.  He had a few starring roles but mostly was a character actor.  Wikipedia credits him with being in seventy-nine (!) Soviet-era films and another fifty-two voice-acting roles for Soviet and foreign films. (The latter the database doesn't register.)  And he had some pipes.  Here he is as a singing taxi driver:

The weighted candidate for the honor of being the Soviet Kevin Bacon/Harvey Keitel is Nikolai Grabbe:

Grabbe is in a lot of ways similar to Sarantsev. Lots of small or medium parts, a few starring roles and maybe as many films overall as Sarantsev.  But Grabbe probably gains in the weighted calculation from his having started his career earlier (as a young actor during World War II in We're from the Urals) and then was in a few bigger films (including a small role in Andrei Rublev) that connected him with other highly connected people.  Here are the other top ten that the database came up with:

Top Ten Centrality Rankings of Soviet Film Personnel
Rank Unweighted Weighted
1 Iurii Sarantsev Nikolai Grabbe
2 Artem Karapetian Artem Karapetian
3 Ivan Ryzhov Iurii Sarantsev
4 Mikhail Gluzskii Ivan Ryzhov
5 Nikolai Grabbe Vladimir Ferapontov
6 Mariia Vinogradova Konstantin Tyrtov
7 Igor' Efimov Mikhail Gluzskii
8 Aleksandr Beliavskii Viktor Filippov
9 Daniil Netrebin Viktor Ural'skii
10 Konstantin Tyrtov Nikolai Smorchkov

I don't think I am being especially ignorant to say that I don't recognize any of these names. (A search in Russian on youtube for most of these actors picks up movies they have been in but very few of the clips created to showcase the work of the truly famous.  There's no Grabbe clip, for example.)  From reading over their biographies, it seems that most were like Sarantsev and Grabbe: born in the 1920s or 1930s (acting from the 1940s or 1950s onward), professional acting education, tons of acting work but nothing that would have made them huge names (feel free to correct me in the comments if I am wrong about their fame).  So my first reaction to this list of names was surprise:  Where are all the big names I am familiar with?  What about Sergei Bondarchuk or El'dar Riazanov (or Evgenii Leonov or Iurii Yakovlev like Jared guessed). Surely they were more important?  

I was thinking about it all wrong.  What this calculation of network centrality measures is not necessarily fame but rather position.  The relative obscurity of these figures highlights the value of network analysis for understanding Soviet cinema.  The data reveal a different kind of influence (maybe banal) lost with qualitative sources.  Those sources tend to focus on the brightest figures but don't register those ubiquitous people who stood out less.  In the same way you would probably not name Harvey Keitel as the most influential or important actor in Hollywood (maybe in the 1990s you might think Kevin Bacon, right?) but it makes a lot of sense to say that Keitel is one of the better positioned and networked people in film.  In the same way those ubiquitous people from Soviet films were around all the time for a reason, and it probably was because, in their own way, they made Soviet cinema run.

To sum up, I think the data makes a pretty good case for a few conclusions about the Soviet film industry as a network:  
  1. It was highly interconnected.
  2. There was a small-medium sized group of professional filmmakers (more than 10,000 fewer than 15,000) that was the most connected in this network.
  3. In that connected group, there was a small group of mega-connected actors who did lots of work but there were larger groups of professional filmmakers--especially directors, art directors and camera operators--who were at the center of Soviet cinema.
I also think I make a convincing argument that the weighted representation of the Soviet cinema network is probably more historically accurate (if less intuitive) than the unweighted network.  But it would also probably be less fun to try to guess what the weighted path between Kevin Bacon and Aleksei Batalov would be.  (My guess, 2.458.)