Friday, June 9, 2023

Proverbs

  1.  It's not about Big Data, it's about the right data!
  2. Data without theory is meaningless; theory without data is sterile.
  3. A peer-reviewed publication isn't a proof. At best it's a line in a proof.

Monday, March 15, 2021

Socially Integrated Computing: A Vision for Computer Science

What is Computer Science?

Computer science as a field started in the early 20th century with the first general purpose electronic computers, and the theoretical models of Turing, Church and others. As ideas of computational organization and architecture were developed and notions of (hardware and software) layers of abstraction evolved, the field burgeoned and grew in many directions, spawning many subfields (architecture, programming languages, algorithmic design, complexity theory, and so on). Overall, though, the fundamental object of study in the field is the computational system, which can be thought of as a kind of machine inside a box — we feed inputs into the box, and receive outputs from the box (computed by the machine).


The evolution of the field can be understood as the progressively deeper understanding of, and the ability to engineer, such computational systems. This can be best understood as growth along three interrelated dimensions: power, trustworthiness, and reach. 


Power is the ability to perform larger and more difficult computations. Faster and larger computing machines increase power, as does the development of faster algorithms. Complexity bounds enable us to better understand the limits of computational power, and programming language structures enable effective construction of more powerful software systems.


Trustworthiness refers to what extent we can ensure that our systems are reliable and trusted by users at the task(s) they are designed and deployed to perform. Cybersecurity measures work to ensure trustworthiness in the face of external threats; program verification and software engineering techniques to ensure it against (inevitable) human error. Consideration of human factors is also essential for computational systems to be trusted by users who are not privy to the system’s internal workings or design process.


Reach is about how widely computational systems touch and influence human activities. The earliest electronic computers had limited reach, serving a limited number of needs of government and large businesses, while today we carry powerful hand-held computers serving a great variety of personal and social needs, and the growth of IoT promises to increase computing’s reach even more.


Growth in each of these dimensions requires, influences, and constraints growth in the others. Increasing power or reach opens up new gaps in trustworthiness that need to be addressed, while increasing trustworthiness and reach tends to require more computational power.


Current Trends

We can understand the implications of current trends in computing via this tripartite lens, as well as the relationships between these developments.

  • Recent successes in machine learning with big data have increased the reach of computing systems into new segments of society and the economy, therefore also highlighting new aspects of trust that need to be addressed, in achieving fairness and dealing with bias in analytical and decision support systems.
  • Edge computing is about increasing computational power by redistribution of computation, concomitantly enabling the increase of reach, thus also requiring deeper attention to questions of trust in cybersecurity.
  • The complementary tech areas of Internet of things and intelligent spaces also increase reach greatly into our lives and daily activities, raising questions of trust related both to cybersecurity and to how understandable and predictable their responses are to us.
  • Similarly, digital manufacturing increases the reach of computing into the physical realm and gives us new questions of trust in cybersecurity and complex logistics management system that need to be solved, as does digital twinning. 
  • Development of more realistic virtual and augmented reality systems promise a radical increase in computing’s reach, and profound social impact. This has been enabled by increases in computing power, and to reach full realism (however defined) will require even more power (cf. edge computing). It will also raise new questions of trust, particularly around privacy, as well as the effects of trust breaches when computing is tightly integrated with everyday activities
  • Quantum computing promises greater power, and specifically power which undermines public-key cryptography, a main pillar of current computational trustworthiness.
  • On the other hand, blockchain and other distributed ledger enable the creation of distributed systems that can be trusted, even if individual actors using the system are not (zero-trust information security), and that are robust to losses on the network (due to distribution). This uses the power of the network to create systems that increase trust and thus enables increased computational reach (via new financial and other applications).


The Next Step

Where, then, is the edge of the field where innovative research will move the field forward? 


At the macro level, we see a great deal of possibility in all three main themes: Computational power has reached the limits of Moore’s Law, and so is ripe for a deep conceptual shift, whether through quantum, parallelism, non-von architectures, or some idea yet to be discovered; Computational trust is strained (at best) by increases in power and reach of computing - cybersecurity is an eternal arms-race, catalyzed further by demands for efficiency, and increased use of automation (whether AI or otherwise) raises new and evolving questions of algorithmic trust; Computational reach has increased enormously through the ubiquity of computing devices, both personal and IoT, which spread computation to new aspects of our societies, economies, and lives.


In fact, if we consider the effects of many of the current trends identified above we see one theme emerging overall — the growing intertwining of computational systems with human systems (individual, organizational, social). Personal devices (phones, watches, etc.), IoT, and intelligent spaces deeply connect our daily and minute-to-minute activities with computational adjuncts; as virtual/augmented reality becomes good enough for broad use, this integration will leap even further. In industry, digital twinning and digital manufacturing put computational models at the center of physical industry. And, while use of information technology has a long history in finance, the way blockchain has enabled creation of non-governmental currencies, and how modern data analytics has increased the power and reach of high-frequency trading and development of complex financial instruments, has transformed the industry (and left regulation somewhat behind for now). Similarly, e-discovery is transforming staid law offices, computer vision food safety as well as law enforcement, and machine learning systems are being applied (rightly or wrongly) to a whole host of societal, political, and business problems.


Thus, the broad paradigm shift that is needed is a change in how we think about the object that we study in computer science. Rather than it being a metaphorical glass box containing just a computational mechanism, receiving inputs and providing outputs to an external user

we need to expand the box we consider to include also the user and their behavior, as part of a larger, more complex computational system


and even beyond the user, also their social/organizational context 


This system has enormously more degrees of freedom, and is not fully controllable by the computer scientist, but cannot be ignored, due to the tight interconnections between the core computational system (hardware, algorithm, data structures) and the information flow and incentives induced in its human users. This view of the proper object of study of our field we may term socially integrated computing, in that the human and social context (understood broadly to include all relevant human-human connections and interactions) are taken as integral to the computational system to be analyzed and designed.


There is, of course, much work in this vein already, in HCI, in social network analysis, in agent-based modeling, in computational economics and mechanism design, and so on. I believe, though, that these disparate types of research work can best be understood together as aspects of a shift in how we view the field as a whole — socially integrated computing — and that such a unifying view ought to transform it, and its effect on the world, for good.

Tuesday, May 19, 2020

Commencement 2020

It is a great shame that universities were unable to hold a regular commencement this year, and honor the achievement of our graduates, not only to complete a university degree, but to do so under nearly-unprecedented conditions, with their entire educational system up-ended in the middle of their final semester.

The following is the text of my speech to the graduates of the Illinois Tech Computer Science Department this year:
It is my great pleasure to welcome you all today. I am Shlomo Engelson Argamon, Interim Chair of the Computer Science Department at the Illinois Institute of Technology. 
None of us ever expected that your graduation would be like this—we should be together on campus, celebrating you and your accomplishments as you all truly deserve. I look forward to the day that we can invite you to return to campus for our next commencement, and I hope many of you can, so that I can congratulate you in person. 
Today is a day of celebration as we gather virtually to recognize your accomplishments and honor each of you as a 2020 Illinois Tech graduate of the Computer Science Department. I only wish we could now be together, with your parents, family, and friends, to share this great moment of your graduation. 
Congratulations! Your hard work at Illinois Tech has set the foundation for your future professional success. 
You are graduating into a world of enormous challenges – the reverberations from this pandemic will echo economically and socially for many years to come. And for you--your final year at Illinois Tech has sadly been disrupted, and you may soon or already be facing personal challenges as well. If you do, I pray you overcome them with grace and dignity.
Amidst all these challenges, you are also graduating Illinois Tech into the field that will be central to addressing them – indeed, without your field, we could not even have this event today. Ironically, computer scientists, the stereotypical socially-inept geeks, are the ones now enabling people to remain connected with one another in these difficult times.  
Computer science is not and can not just be pure technical brilliance walled off from the messiness of human relationships and interactions. The systems we create now mediate and shape human society in profound ways, both beneficial and harmful. 
Many of you, I expect, will become great innovators, as researchers, developers, entrepreneurs. I urge you to not get lost in the admittedly fascinating technical complexities of our discipline. Always consider carefully the larger “computational system” of people, and relationships, and organizations within which your work is embedded.
This is both a technical, and a moral, imperative.
Disinformation now is rampant, spread and incentivized as a side-effect of the algorithms that make social media efficiently monetizable—we are slowly starting to realize how self-driving car modes encourage reckless driver behavior—holes in system security inevitably lead to terrorizing children through internet-enabled baby monitors, or videoconferences through zoombombing, all for the lulz—and on and on.
The future of our field and its enormous effects on the world depend crucially on this fundamentally human perspective, which will affect not only the applications we develop but the very nature of the field itself. 
Illinois Tech was founded 130 years ago on the idea that a first-rate technical education could enable individuals to change their lives and the world for the better. We have endeavored to give you that education, the rest is up to you.
Graduates of 2020 – I cannot wait to see what wonders you each create. Dream high, and celebrate your successes, both large and small.
As of today, you are all Illinois Tech alumni. Bear that title proudly as you move on in your lives, and please stay in touch with the Computer Science Department, with Illinois Tech, and with one another. Wherever you may go, we at Illinois Tech will follow your future accomplishments with keen enthusiasm.

Tuesday, May 14, 2019

Musing: Role of Attention in Language Development

Some thoughts about developmental linguistics, based on observations of my kids. These ideas are admittedly naïve, and I don't know the relevant literature, so would appreciate any expert feedback, whether support or refutation, if these ideas are old hat.

Observation 1.

My 8-year-old frequently says things like, "John, he was playing basketball," topicalizing the subject by left-dislocating it. (I'm not sure if this is only for animates.) This is not usual in our dialect, so I doubt that the kid has often heard it. My wife and I incessantly correct this usage (yes, yes, intellectually I am a good descriptivist, practically, however...), and yet it persists strongly.

This is rather interesting, I think.

To the best of my (limited) understanding, most syntactic formalisms are structured around subject-predicate relations, and treat information structure as piggy-backing on fundamental mechanisms, but this observation makes me wonder if information structure might be more fundamental in some sense. 


Observation 2.

Observation of my (slightly language delayed) not-yet-two-year-old leads me to consider further a foundational role for attention and indexicals. (I know the latter is a rather old idea.)  
The child has excellent comprehension, as far as we can tell, as he responds appropriately to quite complex utterances, and makes himself understood (with difficulty) mostly through grunts and gestures (and a slowly growing vocabulary).  
What I've noticed, though, is that he achieves communication largely by coordinating and controlling mutual attention. He will gesture so to capture my attention and direct it to an important object, and in fact will often grab my face and turn it to face him as he gestures, or to face what he is interested in, before gesturing and grunting his "utterance".  
Much of his speech therapy also uses attention as a key mechanism; vocabulary is given to him as he attends to certain objects in play, or to draw his attention to those objects. 


The primacy of vocatives in early speech is also closely related to attention coordination - very simply, a vocative is "getting someone's attention".

Information structure, whether topic, focus, or given/new is also about coordinating attention between the speaker and hearer.

Language development would seem, therefore, to depend critically on elements of theory-of-mind having to do with (at least) attentive focus (and related notions such as affordances and intentions). Notions such as "reference" would thus be constructed socially, in a sense, by mutual attentive connection of a linguistic unit and the referent.

(I think the notion of mutual attentive focus largely solves Quine's "gavagai" conundrum.)

Please forgive me if these musings are old-hat, utterly naïve, or just plain stupid. But if you are better informed about these matters, please comment with some relevant reading...

Saturday, January 19, 2019

There is always bias (or, binary numbers are not the villain)

Twain Liu just wrote a piece on Quartz, entitled "Aristotle’s binary philosophies created today’s AI bias". This article is riddled with buzzword-laden arguments by feeling, such as summarizing "the very system on which all modern technology is built" as:
1 = true = rational = right = male
0 = false = emotional = left = female
The false dichotomies built into this asseveration rattle the brain. Indeed, the entire essay has this flavor, and it is not even false.

But since the piece has gotten a fair bit of attention, I feel the need to respond to the key claim of the piece. The entire argument rests on the dual assertion that the fact that computers use binary numbers (1s and 0s) as the basis for their operation is (a) based on Aristotle's (elitist, sexist) philosophy, and (b) the fundamental reason why algorithmic systems are biased. Hence, new computer systems not based on Aristotelian "binary" logic can be universal, unbiased pure goodness.

Well.

First off, the "computers are binary and essentially invented by Aristotle" claim is a load of argle-bargle and pure applesauce. (Clickbait headlines in the Atlantic notwithstanding.) When electronic computers were first being developed in the 40s and 50s, different systems were experimented with (including ternary (three-valued) logic), but binary was the most practical for a simple reason. With binary logic, you can represent a 0 by "voltage close to 0" and 1 by "voltage close to maximum". When you introduce more possible values, the system becomes more sensitive to noise, and hence less reliable. (There are other technical reasons for binary computing, and there are some other reasons to prefer ternary systems, but this is enough for my purposes.)

Now, to bias. Binary numbers have nothing whatsoever to do with algorithmic bias. The binary number system does not limit you to using only 1 or 0 for values you need to represent (after all, you could not specify an address to Google Maps just as a 1 or 0, say). Indeed, you can represent as many different values as you like by stringing bits together. You can have as many categories of whatever you like as you like. Any computer scientist would recognize this aspect of the claim to be laughable.

Algorithmic bias is due to the simple fact that all decision systems have biases. (Indeed, it is impossible to learn anything from experience without some sort of bias.) No real system has perfect information, and any decision made on the basis of imperfect information is biased in some way. The question is not "Can we create unbiased algorithms?" but "Do we know what our algorithm's biases are?" and "Can we mitigate the ones we do not like?"

Utopian visions like Ms. Liu's that if we just had the right philosophy, we could build computer systems that will be universal and unbiased, pure purveyors of algorithmic goodness, are false and actually dangerous. They promote the technocratic idea that there are unbiased algorithms out there, if we could just find them, and so keep our focus on algorithmic development.

However, bias is inevitable. The way to combat pernicious bias is through continuous monitoring to discover instances of problematic bias and exercise of good judgment to adjust systems (whether algorithms, training data, how the systems are used, etc.) to mitigate the bad effects while maintaining the good ones. The proper way to combat algorithmic bias (which some are working on) is to develop better ways of detecting and characterizing such bias, and the societal institutions and incentives that enable dealing with deleterious such biases. (And this leads into questions of value systems and politics, which cannot be avoided in this arena. There is no royal road.)

Visions of simple solutions derived from proper thinking are seductive. But the necessary condition for developing and maintaining diversity-enhancing technologies will be, I'm afraid, eternal vigilance.

Monday, January 7, 2019

Open Letter to PSU VP of Research Mark McLellan

Peter Boghossian is an assistant professor of philosophy at Portland State University. He recently, with two non-academic colleagues, published an account of an effort they made to probe peer-review methods within certain fields of inquiry that they term "grievance studies". Briefly, they wrote academic articles based on fanciful theories and hypotheses, matching as well as possible the style of writing and argumentation in the fields they addressed, and managed to get several articles accepted at leading journals. After doing so, they published their account of their effort, revealing the deception. This, they argue, has implications regarding the reliability of peer-review in those fields and perhaps regarding the legitimacy of the fields' methods themselves. I express no opinion regarding their study or conclusions.

What I am writing about is the response of Boghossian's institution, which was to investigate him for research impropriety, and ultimately to determine that Boghossian's "efforts to conduct human subjects research at PSU without a submitted nor approved protocol is a clear violation of the policies of [his] employer."

Unless the facts are substantially different from what has been published, this case raises concern about academic freedom and freedom of inquiry. It is debatable at best whether Boghossian's work required IRB review at all, and even if it had, the situation is not one to rise to the level of research malfeasance. If any readers have more information about the case, please let me know.

Below is the letter I wrote about the case this morning to Prof. Mark McLellan, Vice President for Research and Graduate Studies at Portland State University. Obviously, I speak only for myself, not for my institution.


Dear Prof. McLellan,

I have read with some concern of the investigation and conviction of Prof. Peter Boghossian of unethical research practices. This is a serious charge and as such warrants proper due process and full consideration of all relevant facts and circumstances. For the reasons I will detail below, I believe this not to have been the case here, and I urge reconsideration of this case, for the sake not only of Prof. Boghossian, but rather of the reputation of Portland State and the institution of the IRB.

I also note that, generally speaking, a first accusation or offense of this kind (lack of IRB review for research that did not result in proven tangible harm) will result in a warning and discussion with the faculty member, before proclaiming a determination that they have unambiguously violated ethical norms and university policy. Consider, for example, the very long time and repeated discoveries of egregious and intentional research malfeasance (far beyond anything that Prof. Boghossian is accused of) that were necessary before Dr. Wansink was finally censured by Cornell.

In the case of Prof. Boghossian, there are three essential questions whose answers would determine whether the project was subject to IRB review, and whether the project as conducted was unethical in any way.

First, was the project "research"? I believe the answer here is indeed "yes", since the project was undertaken to develop knowledge and disseminate it, in this case about the peer-review practices in certain fields of inquiry.

Second, did the project involve "human subjects"? Clearly, the fabricated research studies used as experimental probes did not. The reviewers of these articles, while part of the phenomenon under study (the "peer-review system") also were not human subjects per PSU's Human Subjects Research Review Committee Policy, which states:
A human subject is a living individual about whom an investigator obtains data, either from intervention or interaction with the individual, or through records which contain identifiable private information.
Since the peer reviewers were entirely anonymous and not identifiable, the investigators cannot be considered to have been obtaining data about them - no private information whatsoever was gathered, and the reviewers were performing their usual professional function. Thus they cannot have been considered human subjects by this definition, and the research was not subject to IRB review.

Furthermore, even if the project had been reviewed, it would have been exempt under 45 CFR 46.101, as "Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures or observation of public behavior" with anonymous subjects. As such, at worst Prof. Boghossian should be admonished to seek IRB review for such research in the future.

Third was the accusation of "fabricating data" due to one of the fabricated research articles containing made-up statistics about canine sexual activity. Clearly, since the article was not intended to remain a part of the research literature, but to be unmasked as false, there was no intent to deceive the research community. As such, this was not fabrication or falsification of research data or results.

Taken together, the facts seem clear that Prof. Boghossian's project never warranted IRB review at all, or if it did, would have been exempt. In any case, the only potential consequence should be a discussion with him regarding the importance of undergoing IRB review for future such projects. I urge that Portland State rescind its determination that he violated university policy, and restore his professional and academic standing within the university to the status quo ante.

I would be happy of course to discuss this matter further if it would be of use.

Sincerely,

Shlomo Engelson Argamon
Professor of Computer Science
Director, Master of Data Science
Illinois Institute of Technology

Tuesday, October 17, 2017

Calls to Jettison "Statistical Significance"

I recently wrote an opinion piece for American Scientist's Macroscope blog, arguing (inter alia) that the notion of "statistical significance" harms the epistemology of science, and should therefore be jettisoned. After I'd submitted the piece, I saw other articles, letters, and posts saying the same thing, arguing from a number of technical and sociological bases. I will gather here, to the best of my ability, a list of these articles as a resource for the community. If you know of others that I've missed, please add them in the comments.