Wednesday, July 27, 2016

Guccifer 2.0: Russian, not Romanian

Lorenzo Franceschi-Bicchierai recently interviewed (over Twitter) Guccifer 2.0,  who released documents hacked from the Democratic National Committee and claims to be the hacker of the DNC, concluding that he is probably not Romanian as he claims to be, and may very well be Russian. (Other evidence also points to the hack being perpetrated by the Russians.)

What, if anything, does Guccifer 2.0's language usage in the interview say about his likely national origin?

I performed a linguistic and stylistic analysis of Guccifer 2.0's English language responses in the interview. I examined all the clear errors in English grammar and style/usage, and considered whether each indicates that the writer is more likely a native Romanian or a native Russian speaker.

Summary of Results:
There are seven unusual (non-native) features in the English of the interview. Out of these seven, five clearly point to the author being a native Russian speaker, one weakly points in that direction, and the last says nothing. Hence we can conclude that the author is far more likely to be Russian than Romanian.

Detailed Analysis:
  1. He refers to himself as a "women lover" in the plural.  In Romanian this would be "iubitor de femei" (instead of "femeie"), while in Russian it would be "любитель женщин".   Note that in Romanian, the phrase contains the preposition "de", corresponding to English "lover of women", while in Russian there is no preposition, as in the English phrase he used. Also, the odds of the Russian combination occurring as a phrase (based on Google searching the phrases with constraint site:ru and site:ro, respectively) is higher that of the the Romanian combination occurring as a phrase.  Thus the phrase is more likely a calque from Russian than from Romanian.
  2. He writes "I've already told," instead of "I've already told you." This is a case of "pro-drop", where an argument to a verb (usually a pronoun) is dropped when understandable from context. In Russian, the equivalent sentence "Я уже сказал" (or "я уже ответил") is grammatically and contextually correct. In Romanian, the equivalent sentence, "Deja am spus" is technically correct, but it sounds odd and robotic to native speakers without the pronoun "ti" (you). This example is thus evidence that the writer is more likely Russian than Romanian.
  3. He uses the word "deal" multiple times to mean "hack" or "operation" (e.g., "DNC isn't my first deal", "...all my deals"). In Russian, a deal is "сделка", business is "бизнес", an affair is "дело", an operation is "операция", an effort is "усилие", and an enterprise is "предприятие". In Romanian, these are, respectively, "afacere", "afaceri", "afacere", and "operație", "efort", and "afacere".  This may point towards Romanian, due to the similarity of several of the common translations for this concept, but more likely points towards Russian, due to the similarity between "дело" (delo) and "deal" (as well as "сделка" sdelka) - the English word "deal" is an false cognate of these words, that a Russian speaker may choose because of the similarity of sound.
  4. He replies to a request to reply in Romanian as proof of his origin with "Man, I'm not a pupil at school." This is odd because of the use of the word "pupil" ("kid" would be more likely in colloquial English), the use of the preposition "at" instead of "in", and the start of the sentence with "Man, ...". 
    1. In both Russian and Romanian, the phrase "pupil at school" ("ученик в школе" and "elev la școală", respectively) is slightly rarer than either "kid at school" or "boy at school". There is no clear difference here.
    2. In Russian, both English prepositions "in" and "at" are generally translated by "в", while Romanian has different prepositions, "în" (in) and "la" (at), giving evidence for a Russian native language.
    3. The odds of the phrase "I'm not" (Russian: "я не"; Romanian: "eu nu") being preceded by "Man, ..." (Russian: "Человек" or "Tоварищ,"; Romanian: "Omule,") in Russian is more than five times greater than in Romanian. This phrasing is evidence for a Russian native language over Romanian.
  5. One of the most prominent features of Russian (and other Slavic-language) speakers writing in English is the general lack of articles (the words "a" and "the"), as those languages lack them. Romanian, on the other hand, as a Romance language, has both definite and indefinite articles. While many of the interview responses used English articles correctly, there were a number which did not (e.g., "I used [a] 0-day exploit" and "... the NGP VAN soft"). The inconsistency is not a complete proof of Russian (or Slavic) L1 authorship, but is strong evidence that the writer is a Russian speaker with relatively good English.

There are in total 7 oddities in the English text that may indicate the native language of the writer. Five out of the seven point clearly to Russian over Romanian as the native language, one ("deal") points weakly to Russian, and the last ("pupil") is inconclusive.

Overall, therefore, the linguistic evidence consistently points towards the writer being a native Russian speaker. It is also possible that the writer is a Romanian speaker who has studied Russian (often L2 features spill into a third language more than L1 features do); however the writer denied knowing any Russian, and so the most reasonable conclusion is that he is a Russian native speaker rather than a Romanian native speaker. This evidence, combined with the evidently problematic Romanian language use in the interview, indicates clearly that Guccifer 2.0 is a Russian pretending to be a Romanian. (Note that this says nothing about whether the hack was perpetrated by state-backed actors or an independent hacker or group.)

This analysis is quoted in the New York Times and Yediot Aharonot (in Hebrew).


UPDATE: 

Special prosecutor Mueller's July 13 indictment of 12 Russian GRU operatives on confirms that Guccifer 2.0 was a front persona run by Russian nationals for the GRU, per US intelligence assessments.