Wednesday, August 14, 2013

Six Degrees of Soviet Cinema

I have my version of the Oracle of Bacon for Soviet cinema close to finished on my computer and will post a link to it here for other Soviet cinema nerds to play with once I can figure out remote hosting. [Edit: It is now available at here: Six Degrees of Soviet Cinema. Second Edit: It was throwing a server error and it was too much work to fix so I deleted it.]  In the meantime, I have all the data for Soviet cinema ready.  At the end I will reveal the Soviet Kevin Bacon, which is surely why most are reading this.

Here are the basics about my sources: I derived the database with Python from the website  It lists has 7,866 films listed and in these films it comes up with 52,703 people who were actors, directors, screenwriters, composers, art directors, camera operators or producers.  (It seems like a handful (~10) of these entries are the English language translations of the film title itself that my parser picked up as being actors.  This isn't ideal but not enough to throw the centrality measures off significantly.)  The collection seems like a surprisingly complete set of Soviet films.  If there are missing films I had trouble finding them.  For example, I looked for and found the obscure, lost 1932 film The Guy from the Missouri River, a film about a fictional agricultural commune based on the Seattle (American) Commune that I have done research on.  A large number of the entries are for films that IMDB doesn't have.

There are ways that the database is incomplete, though.  IMDB does a great job of including every person who worked on a film and does not, so many people--especially crew--are not included. On the flip side, it could be argued that some films and actors should not be in the database at all.  The Blue Bird a Lenfilm/Twentieth Century Fox fantasy film co-production from 1976 is one example (and the reason that Elizabeth Taylor (Elizabet Teilor) and Jane Fonda (Dzhein Fonda) appear in the database).  It is also might make sense to have banned films count differently, although they are included for technical and intellectually justifiable reasons because they were part of the Soviet film industry.  The database includes some television shows. (Only mini-series programs, it seems, but I could be wrong.)  It does not include documentary films, which is a hard pill to swallow because it necessarily meant the exclusion of significant figures like Dziga Vertov.  I included a video below as a form of apology.  However, it was necessary because it would have introduced a large number of "actors" into the database who were actually the subjects of these films and not involved in their production.  But all in all, it seems like the listings for fiction films are as complete as can be found without digitizing Soviet film catalogs by hand.  (Of course, if there are any corrections for individual films or if there is a more complete data set I would appreciate a heads up.)

I had two goals when putting this database together.  The first was to make something similar to Oracle of Bacon that would be a little toy for Soviet cinema buffs to test out.  But I also wanted to use graph theory to assess the structure of Soviet cinema as a professional network from a quantitative perspective.  The main metric I think is helpful for understanding the network is centrality, the average of the distances from any one person to any other person in their network.  I calculated the centrality two ways.  The gold standard classic of centrality calculations (the Coca-Cola if you will) is degrees of separation between people.  In this network representation it doesn't matter if two people have done one movie together or a hundred movies together; any connection is a connection.  The other calculation is the centrality based on weighted distance (Pepsi?). This representation creates connections count more or less based on the number of films two people worked on together (representing a stronger connection).  The first I included because it is more intuitive (if someone's centrality is two, it means that they can get to a random figure in the network in two steps).  The second I included because I think it more accurately represents how professional networks operate (take a look at my last post for information on this).  And remember that a lower number is more central because it means less distance from that person to get to any other person on average.

Even before I did any of these calculations, though, the main thing that jumped out at me was just how connected the world of Soviet cinema was. Of the 52,703 entries in the database, only ten cannot be connected to the main the network.  (The two casts from films from the interwar period Zasukha (1932) and Beloe zoloto (1929) apparently only worked on those films, respectively.) The rest--more than 52,000 people--can all be connected to each other through one film or another in six or fewer steps.  In fact, the majority of figures (37,141) can reach any other figure in that network in four steps and most of the rest (15,467) can reach any other figure in five steps.  There are thirty-five super connected figures who can get to anyone in three or fewer steps.  In other words--and this may seem obvious--the world of Soviet cinema was very small.

There are other things this graph can tell us that are less obvious.  For example, how connected was the average Soviet film worker and what can that tell us about Soviet professional networks more generally? I made a chart that aggregated the centrality of the people in the network into quarter steps.

The average was 2.76, meaning it took the average person in the Soviet cinema network can be connected to any other figure in between two and (more likely) three steps on average. A majority of people in the network fall right into that mean.  Then there is group of about 15 percent of the network who could reach anyone else in between two and two and a half steps.  And then there is a very small elite group who can get to anyone in about two steps.  (If I broke it down further, it would show that those under two steps are really averaging about 1.97 steps.)  If this was an accurate representation of the professional network of Soviet cinema, it would seem to suggest that there were some very well connected people in Soviet cinema, many people who had average connections and only a few who were very poorly connected.

There is another way of measuring centrality, which is to weight the network. It make sense to account for the strength of someone's connection to others in networks like film where someone (Tim Burton) might work dozens of times with one or two highly central people (Johnny Depp) and that person's centrality to the network does not register as much as it should.

I don't think this picture even has all the Burton/Depp collaborations.

This strength of connection can be measured mathematically in various ways.  The basic way is to count the number of times two people from a network are connected and make the cost of their connection equal to one/count of connections.  So if person1 was in four movies with person2 who was in two movies with person3 the path from person1 to person3 would be .25 + .50 = .75.  In a second calculation of centrality, I used this way of calculating shortest weighted path.  Here is the chart of the aggregated weighted path centralities by quarter steps:

It turns out that a couple differences from the unweighted chart are interesting and, based on what I know about Soviet cinema, are probably more representative of the professional network of Soviet cinema.  This data divides film personnel more clearly into two groups.  There is a group of film personnel who have are quite central (to about 1.6 on this scale), a dip and then the majority of personnel clumped around the mean. (And the dip around 1.6 would be bigger if I broke the data into tenths rather than quarters.) Compared to the unweighted data, it would suggest that there was a two-tiered hierarchy in Soviet cinema.  The comparison of the two data sets (using their standard deviation, since the average weighted path is shorter) makes it clearer:

Why do I think the weighted calculation represents the professional world of Soviet cinema better than the unweighted calculation?  I analyzed the issue in my last post, but in general I think that a weighted representation in a social or professional network makes more sense, since collaboration on a larger number of projects usually reflects stronger personal/professional connections than collaboration on fewer.  But in the specific case of this cinema database and the reality of Soviet cinema, I also think there are conditions that make a weighted calculation more appropriate.  My anecdotal observation from looking through the database is that many people in Soviet cinema did one or two films (as represented by the big bulge of people in the middle of the weighted data) and then moved on to whatever else. (And some of those people are like "Dzhein Fonda," who obviously has done many films but only one that registers in this database.)  But a small-medium size group of people in the database (12,000-15,000) did many, many films and therefore was far more plugged into the Soviet film network.  My interpretation based on these figures are that these people represent the core of the film industry in the USSR.  (But only represent, since the entire crew of films is not included in the database.)

There are other interpretations that I think also might explain (or contribute) to this distribution of centrality.  One that struck me as very plausible was that the explosion of media in the post-Stalin period ("Moscow Prime-Time" as Kristen Roth-Ey puts it in her book), meant that figures who worked in that period were naturally going to rate as more central overall.  I had problems thinking of a really famous Stalin-era actor but we can take Mikheil (Mikhail) Gelovani--the actor who played Stalin and was therefore in many films until 1953.  Gelovani ranks just 2501th overall in the weighted calculation (about fifth percentile) and 4873rd in the unweighted calculation (about tenth percentile).  In contrast, Aleksandr Dem'ianenko--the star of Leonid Gaidai's Shurik movies and a fixture of post-war film--ranks 27th and 15th, in the first percentile for both.

Gelovani as Stalin in The Fall of Berlin, one of his best known roles

Similarly, Sergei Eisenstein ranks lower than Gaidai, even though they worked on a similar number of films.  Eisenstein is in the top third and top half of directors by weighted and unweighted centrality whereas Gaidai is in the top percentile and top eighth percentile in the same rankings.  Even though Eisenstein and Gaidai worked on similar numbers of films, Gaidai measures as way more central because comparative size of the industry during his era.  That said, I think that it just takes figures from before the 1950s a hop or two to get out of their era.  That just pushes them to the back of the core professional cinema group but not out of it entirely.   More on the difference in the film industry in different eras in another post because my computer is currently chewing on the networks from each of the different eras.

(I'd also like to note here that the weighted-Gaidai metric proves that Leonid Gaidai is basically the Soviet Judd Apatow.  Directors/producers who always work with the same people don't register as being as central in the unweighted calculation but the strong connections to collaborators show up in the weighted stats.  Here is a video of the best of Gaidai.  How many times do Dem'ianenko or Iurii Nikulin or Georgii Vitsin or other familiar faces appear in it? Answer: a million times.)

Best of Gaidai according to someone on YouTube

Besides the different eras, I had considered the possibility that the distribution was being affected by the different jobs of film workers.  I assumed that actors would be more central on average (a lower number) because they can work on many projects while directors and other non-acting personnel have to invest more time in individual projects.  This result would have suggested that the director was not the center of the Soviet film universe.  And, as every book about the history of cinema that uses qualitative measures shows, it would have been inaccurate.  Directors were clearly the people at the reins of the Soviet film industry. However, the data actually back up the the traditional interpretation that puts the director at the center of Soviet film.  Take a look at the average centrality by job:

I had to think a little about why directors came out on top, because all the top figures are actors (except Gaidai, but even he is only in the top 200 in the weighted calculation).  So what is going on?  According to my database there were a small number of actors who ranked very highly (less than .01% in either calculation) but the majority of actors rated as much less central than workers in other tasks in Soviet film.  For example, about 43 percent of directors have weighted centrality ratings between .75 and 1.5 but only about 15 percent of actors are in those categories.  In fact, actors on average have the worst centrality rankings, except for producers--which is a small category that seems to have been imported in the late 1980s.    Take a look at the distributions of the different professions' centrality in the network by percentage (I just included the weighted.  If anyone wants the unweighted centrality I can post that as well but it is similar.):

Of course, normalizing the data by using percentages belies the overwhelming number of actors (40,000+) in the database versus the other types of workers.  The reason for this large number is that almost every movie has three times or more the credited cast as it has credited crew (according to  I don't know that this contributed to the higher average centrality of the crew positions.  It might decrease their average centrality if every crew member was included for every film.  But including more credits might also make some crew who are listed but undercredited more central.  What I think it really shows is that you didn't need to be a professional actor to be in a movie.  Even architectural historian Vladimir Papernyi is in the database for a bit part he had! (He was in the film Leap Year (1961).  His rank: somewhere in the 29,000s, not part of the core professional cinema group. ) Check out his profile here.  So if many people could do some acting, not every person off the street could be a director, art director (artist) or camera operator.  

There is also some overlap between the categories, especially director and actor, and that seems to have added a few highly central people to the ranks of the directors.  The database counts anyone who has been in any of the positions toward each of those positions.  That means that someone like Aleksei Alekseev, who did a lot of work both on screen and as a voice actor for dubbed films, gets to be a director because he was the sound director on the Soviet-Italian film Life is Beautiful.  He is the second most central director but based almost exclusively on his acting career rather his scant directing credentials.  There are more legitimate borderline cases, like Aleksei Batalov, who was one of the most famous Soviet actors but who also directed three films and is credited as the screenwriter of four Soviet-era films.  I felt uneasy arbitrarily deciding who had done what and just used's classification.  (Who am I to say if Clint Eastwood is an actor, director or lunatic chair-shaman?)

Aleksei Batalov as the ultimate specimen of 
Soviet masculinity in Moscow Doesn't Believe in Tears

Finally if you were wondering--Aleksei Batalov was not the Soviet Kevin Bacon.  And really, I should say he is not the number one central person, which in IMDB is actually Harvey Keitel.  I think the answer will be quite obscure.  According to the unweighted graph, it is Iurii Sarantsev:

Iurii Sarantsev

Sarantsev fits the bill, though, in a lot of ways.  He came up as a young actor just at the right time in Soviet cinema, in the early to mid 1950s, so that he could take part in the post-Stalin media explosion.  He had a few starring roles but mostly was a character actor.  Wikipedia credits him with being in seventy-nine (!) Soviet-era films and another fifty-two voice-acting roles for Soviet and foreign films. (The latter the database doesn't register.)  And he had some pipes.  Here he is as a singing taxi driver:

The weighted candidate for the honor of being the Soviet Kevin Bacon/Harvey Keitel is Nikolai Grabbe:

Grabbe is in a lot of ways similar to Sarantsev. Lots of small or medium parts, a few starring roles and maybe as many films overall as Sarantsev.  But Grabbe probably gains in the weighted calculation from his having started his career earlier (as a young actor during World War II in We're from the Urals) and then was in a few bigger films (including a small role in Andrei Rublev) that connected him with other highly connected people.  Here are the other top ten that the database came up with:

Top Ten Centrality Rankings of Soviet Film Personnel
Rank Unweighted Weighted
1 Iurii Sarantsev Nikolai Grabbe
2 Artem Karapetian Artem Karapetian
3 Ivan Ryzhov Iurii Sarantsev
4 Mikhail Gluzskii Ivan Ryzhov
5 Nikolai Grabbe Vladimir Ferapontov
6 Mariia Vinogradova Konstantin Tyrtov
7 Igor' Efimov Mikhail Gluzskii
8 Aleksandr Beliavskii Viktor Filippov
9 Daniil Netrebin Viktor Ural'skii
10 Konstantin Tyrtov Nikolai Smorchkov

I don't think I am being especially ignorant to say that I don't recognize any of these names. (A search in Russian on youtube for most of these actors picks up movies they have been in but very few of the clips created to showcase the work of the truly famous.  There's no Grabbe clip, for example.)  From reading over their biographies, it seems that most were like Sarantsev and Grabbe: born in the 1920s or 1930s (acting from the 1940s or 1950s onward), professional acting education, tons of acting work but nothing that would have made them huge names (feel free to correct me in the comments if I am wrong about their fame).  So my first reaction to this list of names was surprise:  Where are all the big names I am familiar with?  What about Sergei Bondarchuk or El'dar Riazanov (or Evgenii Leonov or Iurii Yakovlev like Jared guessed). Surely they were more important?  

I was thinking about it all wrong.  What this calculation of network centrality measures is not necessarily fame but rather position.  The relative obscurity of these figures highlights the value of network analysis for understanding Soviet cinema.  The data reveal a different kind of influence (maybe banal) lost with qualitative sources.  Those sources tend to focus on the brightest figures but don't register those ubiquitous people who stood out less.  In the same way you would probably not name Harvey Keitel as the most influential or important actor in Hollywood (maybe in the 1990s you might think Kevin Bacon, right?) but it makes a lot of sense to say that Keitel is one of the better positioned and networked people in film.  In the same way those ubiquitous people from Soviet films were around all the time for a reason, and it probably was because, in their own way, they made Soviet cinema run.

To sum up, I think the data makes a pretty good case for a few conclusions about the Soviet film industry as a network:  
  1. It was highly interconnected.
  2. There was a small-medium sized group of professional filmmakers (more than 10,000 fewer than 15,000) that was the most connected in this network.
  3. In that connected group, there was a small group of mega-connected actors who did lots of work but there were larger groups of professional filmmakers--especially directors, art directors and camera operators--who were at the center of Soviet cinema.
I also think I make a convincing argument that the weighted representation of the Soviet cinema network is probably more historically accurate (if less intuitive) than the unweighted network.  But it would also probably be less fun to try to guess what the weighted path between Kevin Bacon and Aleksei Batalov would be.  (My guess, 2.458.)

No comments:

Post a Comment