Saturday, April 8, 2017

Initial Linguistic Analysis of the Shadowbrokers Texts

This is a repost of my forensic linguistic analysis of "The Shadowbrokers texts", as posted on the Taia Global blog on August 18, 2016.
This is an initial linguistic analysis of the texts from “The Shadowbrokers”, as posted on Pastebin, taken from the tumblr account. This is a qualitative analysis, looking at patterns of grammatical and orthographic errors, to examine the question of whether the author of the text is a native speaker/writer of US English. A quantitative analysis would help firm up and estimate the reliability of our conclusions. This analysis assumes that all the texts were written by a single individual.
There are a number of grammatical errors that are not usual in native speaker US English:

  1. Omission of definite and indefinite articles (“a” and “the”)
  2. Omission of  infinitive “to” (e.g., “I want get” instead of “I want to get”)
  3. Omission of modal verbs “should” and “must” and auxiliary verb “will”
  4. Elision of “it” in “it is ...”
  5. Use of progressive form “is Xing” instead of present or past tense form “X” (e.g., “He is breaking” instead of “he breaks” or “he broke”)
  6. Use of “are X” instead of “are Xing” or “X” (“they are go” instead of “they are going” or “they go”)
  7. Tense confusion – use of base verb form instead of past tense
None of these errors is perfectly consistent – there are examples of correct usage for all of them. Indeed, some grammatical complex sentences are perfectly correct, such as “No other information will be disclosed by us publicly” and “Your wealth and control depends on electronic data.”  Most sentences, though, have multiple errors.
Evidence that the author is a native speaker trying to appear non-native:
  1. Spelling. The spelling is entirely correct throughout, including some long and complex words such as “dictatorship”, “prostitutes”, and “consolation”.  If this had been achieved through the use of spell-checking software, we would have expected to see at least one “Cupertino” (choice of a correctly-spelled but contextually wrong word).
  2. Inconsistent errors. Grammatical errors such as omitting the infinitive “to” or using “is breaking” to mean “breaks” result, in a non-native writer, from deeply held intuitions about how grammar works. The fact that errors 2, 3, 5, and 6 all occur inconsistently (they occur a majority of the time, but not by much) indicates that someone was inserting errors, rather than making them naturally.
  3. Mutually inconsistent errors. Errors 5 and 6 are odd together – if the writer knows about the progressive (-ing) form, then why do they use it only sometimes, when using the auxiliary “is” or “are” with the verb?
  4. Grammatical errors in idioms. There are a number of idioms that would be surprising for a low-skilled non-native speaker to use, and some of them are used with grammatical errors that a skilled English speaker would be unlikely to make. The most reasonable explanation, then, is that the errors were inserted by a native speaker after writing the idioms.  Examples include:
    1. “or [the] bid pump[s] [the] price up”
    2. “bidding war”
    3. “top friends”
    4. “go bye bye”
    5. “where [does that] leave Wealthy Elites”
The cumulative effect of these multiple lines of evidence leads to the conclusion that the author is most likely a native speaker of US English who is attempting to sound like a non-native speaker by inserting a variety of random grammatical errors.

In the (unlikely) event that the writer is, in fact, not a native English speaker, their native tongue is much more likely to be Slavic (e.g., Russian or Polish) than either Germanic or Romance.




No comments:

Post a Comment