The Turing Hub as a Standard for Turing Test Interfaces.

OBSERVATIONS

Loebner prize contest entrants have progressed [1] as web technology becomes more and more prevalent.[2][3][4] In addition to having commercial backing in some cases, we see this kind of software having more iterations of refinement, more advanced technologies, and more resources available in general. [5]

INDICATIONS

The impact this has on a Turing test like the Loebner Prize Contest is indicative of the transition being made everywhere from older console based, mainframe-like software with simple interfaces, to the new web based applications which require a web browser and a graphical user interface to interact with in the real world. For the 2002 Loebner Contest [8], we used web based technology to interface with these newer applications.[5] To present the interaction with the software and the judge, we used a simple Java applet which provides uniformity to the entries using it.[6] The Turing hub provides a working alternative.

CONSIDERATIONS

Although I have read Dr. Loebner's paper [7] about "How to Run a Turing test Contest," I do not agree with him about some things. I think the selection of judges should be representative of the general population. The presence of journalists among the judges and confederates is not what I would want if were going to try and measure the performance of chat robots against some normal distribution of the general population. If you want to have a simple, easy to carry out, competition of console programs, don’t use the Turing hub.

RECOMMENDATIONS

The original Turing test proposed that the judge and his subject would interact via teletype. This inevitably translated to a video display terminal, and now that web based conversant systems are so prevalent, it seems obvious to try and establish some kind of standard http interface to be used when performing the Turing test.

In the past, it has always been difficult to get all the programmers in the contest to use the same interface. Dr. Thomas Whalen used the confederates' communications interface in 1994, which may have been a factor in his winning that year. The next year, he was beaten by Weintraub's PC Therapist, which interestingly was a stand alone program running in DOS. [8]

Some of the problems of the old communications program were that the judge could see the confederate typing each letter, so contenders were forced to try and imitate a human typing responses. This obviously introduces a whole new aspect of human behavior, which I feel detracts form the verbal behavior that is most relevant to the interactions. Other delay factors come into play as well. The time required for an interlocutor to respond is equal to the time required for signal transmission plus the time it takes to read, comprehend, and formulate a reply plus the time it takes to type the reply. This illustrates other aspects of human behavior that must be imitated in a communications system like this. These represent performance metrics that would vary with human participants, but add additional visual cues as to the nature of the interlocutor. The Turing Hub attempts to eliminate many of these problems, and provide a standard interface for the competition.

A simple solution for the delay factor is to provide a standard delay, based on the length of the utterance plus some extra time for uniformity. Another factor that tends to level the playing field in the test is to enforce a "volley ball" exchange format, so that each person must wait until the other person responds before being able to say something else. This is more in line with the nature of the computer software competitors, and brings that factor to a standard basis for comparison of the messages being exchanged rather than the nature of the exchange mechanism.

At the 2002 Loebner Prize Contest, we were able to implement the Turing hub on 3 computer programs and 2 human beings. The rest of the computer entries were either stand-alone programs, or were running via a web browser from a web server on a VPN.

Being able to apply a uniform delay factor removes visual cues, and eliminates the need for having "fake typing" algorithms, and other human behavioral imitations. This allows for more focus on the goal of the contest, in terms of the messages themselves, with little emphasis placed on the means of transmission.

The Turing Hub makes it easy to set up the contest for web-based entries, using standard terminology for form posting details. The software allows for setting up entries to run from any IP or web address, from a local network, or from anywhere on the internet. Connecting 2 human beings together works similarly, but the current prototype requires some special java security settings in order to operate. In future revisions, I will be able to make it so that any web browser can communicate via the hub.

An advantage that "hub ready" applications have over their stand-alone counterparts is that the hub provides detailed conversation logs with timestamps and remote IP information. The ability to analyze, export, or re-digest transcripts is facilitated by my use of DataFlex as a programming language.

Further modifications for the Turing hub could include a timer for limiting the length of the conversation, both for reasons of uniformity, but as a limiting factor. Another feature would be to add the ability for a person to vote whether he was talking with a computer or another human being. This would provide for up-to-date comparisons of the people and robots connected to the Turing hub. A current "most human" computer program would be available at all times.

 

IMPLEMENTATION

The Turing hub is like a switchboard. It currently uses a java applet which looks a bit like a chat room interface, except the current configuration only has 2 interlocutors at a time.

The prototype which I wrote for the 2002 Loebner Prize Contest requires that the web browser be set to allow "unsigned" or "out of the sandbox" operation. This is because I used Java in a RAD approach so that I could quickly create a way to send and receive post information, and to send and receive information to and from the hub.

Here is a crude data flow diagram:

Hub <=====>Applet <=====> Bot


After the contest, I began working on a new prototype where I will make the hub communicate with the bots, so that the java applet can run in the sandbox and wont require anything but a JVM enabled browser.

Applet <======> Hub <======> Bot

I did this because more coding and searching for details would be involved in the next revision.

In the next revision, I will concentrate on giving it a user account and login mechanism. Then I will write the key ingredient for the 24/7 Turing test which will be an algorithm for letting a person log in, and then, at random, either be connected to a randomly selected computer system, or to another person. In that case, both people would be judges! The applet will time the conversations and tell the hub when it is finished, then a new screen will pop up with a form where the person answers "was it human?" or something to that effect. And the hub will tally the results and be able to give real-time calculations of each conversant system’s Turing percentage.

The calculation of the Turing percentage is not difficult. The original Turing Test specified a one on one test and his predictions were about the percentage of the time a program would win against a human, and assumed that a human against a human should win around 50% of the time (and of course averaged over all humans competing against human that will be exact, but for individuals it will vary enormously). In the Loebner prize, the ranking tells you for any two entrants who beat who.

So given C computers and H humans, each computer is matched against H humans and scores W wins and L losses where W+L=H, and the percentage won is thus 100*W/H. Similarly, each human is matched against H-1 other humans, wins W and loses L where W+L=H-1 with percentage won being 100*W/(H-1).


The special case of human-to-human has this data flow diagram:

Human1 <=====> Hub <=====> Human2

This part works on the principle of the "dead drop" like spies use to send information back and forth. There is no TTY-like activity by design, it is made to provide a uniform delay that would result from a constant representing the time to read the stimulus, and then a delay based on the typing speed of someone at 25 words per minute. However, the delay can be longer than that.

For the conversant software, the delay is very simple. When the judge types something, the current time is recorded. Then the post operation is performed on the remote system. If the time after the response is received is greater than that of the prior mentioned formula, it is immediately displayed. However, if the bot responds quickly, the standard delay is used.

The Turing hub currently requires a Microsoft IIS web serving platform, or may be used in Linux with apache web server and JSP. I am writing another version in Perl that should work nicely on UNIX or Linux based web servers.

Most of the web based applications use an html form which is submitted to a CGI application, PHP, ASP, or other form submittal mechanism. This allows for a generalization about the differences found in the form’s requirements, so that a Java applet or other program can simulate a person visiting the web page and making conversations by submitting utterances into the conversant system’s response engine. The responses made by the conversant system are then received by the applet and displayed in a chat room style interface, as if someone had typed the response, but without the fake typing or other visual cues.

Most stand alone entries to the Loebner Prize Contest are DOS applications, and do not have an http interface. ALICE is available in several web based forms, but Richard Wallace decided to use a DOS application for the 2002 Loebner Prize Contest. This put his program at a great disadvantage because some close competitors were running via the hub interface, and so they looked more like the human beings in the contest.

Since the Turing hub can work with conversant systems located on computers located anywhere in the world, there are new considerations that must be evaluated if performing a Turing test using the hub. Some have expressed concerns that it would be easy to just make a program that would let a person type into the hub system instead of interfacing it with a conversant software system. This is a very real possibility. The concern is that a person would be able to substitute his replies and thereby fake the Turing test. In general, it is very easy to tell the computers from the humans, although the state of the art is improving. However, there is nothing to stop a human being from imitating a computer during a Turing test. It seems only fair if the computer must lie to say that it is a human, that the human should be able to lie and say that he is a computer. [9]

It has been suggested that by removing all unnecessary delay factors from the conversant system, that it could be possible to prove that the response came faster than a human being would be reasonably able to provide. However, on a remote system this would require a response time equal to time for internet transmission plus a very small processing speed constant. There are inherent flaws with this idea ranging from the counter productive requirement that conversant systems not attempt to simulate human response times, all the way to possible ways to cheat this method too. Verification of the nature of the conversant software would be a problem inherited by allowing entrants to operate from remote locations during the contest, even though these systems are operating remotely 24/7 already.

The hub computer’s web logs would be needed to prove that the transactions originated at the same computer during the conversation. An encrypted response mechanism could be used to reduce the likelihood of cheating, and would avoid the problems of the time limit proposal. The encryption option might be used in addition to some legal means of contract, like an affidavit of compliance, or perhaps a non-refundable entrance fee.

The results [8] indicate that programs running via the hub were ranked as more human than their stand alone counterparts, quite consistently. Also, they tended to be ranked higher than the other web based entries that used standard web entry forms. I believe that a contest where all the entries used this interface should allow conversations of at least 15 minutes to allow for the best comparison of each system. When the other visual cues are removed, and there are human beings chatting via the same interface, the evaluation comes down to a characterization of each entry’s conversations, the content of these rather than the mode of delivery.

The implementation of the original Turing hub prototype in the 2002 Loebner Prize Contest was not without its technical snafus. Those problems arose mostly from 2 causes, the Java permissions in the web browsers, and the counter intuitive nature of the confederate to judge interactions. The remote humans were connected via a web browser, and without proper instructions of how to initiate these conversations, my first prototype would hang up. It worked fine in the lab, but when put in front of people without any instructions, it became counter intuitive to sit and wait for the other person to reply before being able to make your own message. Consequently, the person would close the browser, or otherwise disrupt the communications, so that the proper handshake could not be established. On a local network, the only considerations I required were some simple “getting started” instructions for judges, per the requirements of the traditional rules posted, i.e. typing “@@01” to start a conversation for judge 01 for example.

CONCLUSION

In order to be able to make comparisons of the “state of the art” systems, some available commercially already, then web-based technology must be included, since it represents the cutting edge for technical advancement of the craft. The Loebner Prize Contest should have to keep pace with technology or else lose relevance to the very question it is based on: “Can machines think?” The Turing hub provides a consistent, easy to use, network compatible interface for performing the Turing test on modern computer systems.

It is possible that the Java chat applet may contribute to the absolute uncertainty as to whether the person is chatting with a human being or not. The absence of visual cues may leave a person with a predilection for professing to be conversing with a human being. The fewer indicators the fewer mistakes detected, and this can lead to robots passing for human.

Try out the Turing Hub at www.turinghub.com

REFERENCES

[1] Computer Modeling in the Loebner Prize Contest, Ken R. Stephens, Cambridge Center for Behavioral Studies, 2002. http://www.behavior.org

[2] The Evolution of Intelligent Agents on the Web, P. Nathan and R. Garner, 1997. http://www.turinghub.com/Studio/paper1_5.html

[3] Internet Statistics: Growth and Usage of the Web and the Internet, Matthew Gray, 1996. http://www.mit.edu/people/mkgray/net/

[4] Hobbes' Internet Timeline, Robert H'obbes' Zakon, 2002. http://www.zakon.org/robert/internet/timeline/

[5] Ellaz description, http://www.EllaZ.com, Kevin Copple, 2002.

[6] CyberMecha.com and Turing Hub, http://www.cybermecha.com/Studio , 2002.

[7] How to Hold a Turing Test Contest, Hugh Gene Loebner, 2002. http://www.loebner.net/Prizef/loebner-prize.htm

[8] 2002 Loebner Prize Contest, Oct. 12th, Institute of Mimetic Sciences, 2002. www.loebner-atlanta.org

[9] How I Failed The Turing Test Without Even Being There, Robby Glen Garner, Blather, 2002. http://www.blather.net/articles/loebner_turing_garner.htm

Copyright ©2002, Robby Garner. All rights reserved.