November 30, 1999
A hotel room has a bed, night table, minibar, ...
A person has zero or one husbands.
A play lasts about two hours.
Cassette means cassette tape or cassette recorder.
Gold hair is called blond hair.
One hangs up at the end of a phone call.
People have fingernails.
Rough synonyms for food are foodstuffs, groceries, chow, grub, ...
Soda is a drink.
Someone who is 16 years old is called a teenager.
The sky is blue.
Human commonsense knowledge is huge, probably consisting of over 100 million items. So far, ThoughtTreasure only contains 100 thousand items. But even a small amount of common sense will improve applications. For example, a calendar and contact manager can use ThoughtTreasure to:
ThoughtTreasure contains a database of 25,000 concepts organized into a hierarchy. For example, Evian is a type of flat-water, which is a type of drinking-water, which is a type of beverage, which is a type of food, and so on.
Each concept has one or more English and French approximate synonyms, for a total of 55,000 words and phrases. For example, associated with the food concept are the English words food and foodstuffs and the French words aliment and nourriture (and others).
ThoughtTreasure contains 50,000 assertions about concepts, such as: a green-pea is a seed-vegetable, a green-pea is green, a green-pea is part of a pod-of-peas, and a pod-of-peas is found in a typical grocery store.
ThoughtTreasure contains about 100 scripts, or computer-understandable descriptions of typical activities such as going to a restaurant or birthday party.
ThoughtTreasure contains 70,000 lines of C code implementing:
Here's an example Java program that accesses ThoughtTreasure:
import com.signiform.tt.*; public class Example { public static void main(String args[]) { try { TTConnection tt = new TTConnection("somehost"); System.out.println( tt.chatterbot(String.valueOf(TT.F_ENGLISH), "Who created Bugs Bunny?")); tt.close(); } catch (Exception e) { } } }We first import the ThoughtTreasure Java-based client API. In the program's main method, we create a connection to a ThoughtTreasure server, which we have previously started running on a Linux PC somewhere. Then we invoke the chatterbot method with a question in English, and print the results. Finally we close the connection.
When we run the program on any machine that supports Java with a net connection to the Linux PC, it prints:
Tex Avery created him.
How did ThoughtTreasure produce this answer? First, text agents look at the question Who created Bugs Bunny? and identify words and phrases. In this case, we have the word Who which can be a pronoun or it can be a noun as in the rock group the Who. The word created can either be a preterit or a past participle. Bugs can be either the plural of bug or the singular of the first name Bugs. In addition, the entire phrase Bugs Bunny is picked up as a singular noun and bunny is picked up as a single word noun:
[H <Who.Hz:who 0-3:<Who>] [N <Who.SNz 0-3:<Who>] [V <created.iVz:create 4-11:<created>] [V <created.dVz:create 4-11:<created>] [N <Bugs.PNz:bug 12-16:<Bugs>] [N <Bugs.SNz¸ 12-16:<Bugs>] [N <Bugs Bunny.SMNz<?\n 12-23:<Bugs Bunny?\n>] [N <Bunny.SNz:bunny<?\n 17-23:<Bunny?\n>]The syntactic parser then takes the parse nodes produced by the text agents and builds parse trees. Among them are two parses that differ as to whether Who is a noun or pronoun:
[Z [X [H <Who.Hz:who>]] [W [W [V <created.iVz:create>]] [X [N <Bugs Bunny.SMNz>]]]] [Z [X [N <Who.SNz>]] [W [W [V <created.iVz:create>]] [X [N <Bugs Bunny.SMNz>]]]]The semantic parser takes the parse trees produced by the syntactic parser and converts them into assertions. Here we have four possible interpretations: Who created Bugs Bunny, the character? or Who created the Bugs Bunny cartoon? or Did the rock group the Who create the Bugs Bunny character? or Did the rock group the Who create the Bugs Bunny cartoon?:
1.0:[create human-interrogative-pronoun Bugs-Bunny] 0.9:[create human-interrogative-pronoun Bugs-Bunny-cartoon] 0.6:[create rock-group-the-Who Bugs-Bunny] ? 0.5:[create rock-group-the-Who Bugs-Bunny-cartoon] ?Then the question understanding agents run on each of these interpretations. One agent does a database lookup to find a person that created Bugs Bunny and it finds that Tex Avery created Bugs Bunny. The agent assigns a make sense rating of 1.0 to the first interpretation, since an answer was found:
1.0:[create Tex-Avery Bugs-Bunny]Another question agent looks in the database to see whether the rock group the Who created Bugs Bunny and determines that it did not, and so it returns the answer, No, the rock group the Who did not create Bugs Bunny and assigns a low make sense rating to that interpretation of the question:
.11:[not [create rock-group-the-Who Bugs-Bunny]] .10:[not [create rock-group-the-Who Bugs-Bunny-cartoon]]After the question answering agents have run on each of the interpretations, the one with the highest make sense rating is selected as the correct interpretation, and the corresponding answer is returned, in this case Tex Avery created him:
"Tex Avery created him." ("No, the Who did not create him.") ("No, the Who did not create it.")
From Java using the TTConnection class and other languages using the ThoughtTreasure server protocol, you can also:
When the information necessary for your application is not yet in ThoughtTreasure, you can add it! I came up with a compact notation that makes entry fast. Here is a portion of the database file physics.txt that defines some subatomic particles:
==hadron.z//strongly#B interacting# particle*.z/hadron.My/ ===baryon.z//qqq.tz/ ====unflavored#A baryon*.z//|strangeness-of=0u|charm-of=0u|bottomness-of=0u| =====nucleon.z/fermion/|spin-of=.5u|isospin-of=.5u| ======neutron.z//n.tz/n# baryon*.z/neutron.My/|electric-charge-of=0u| baryon-number-of=1u|part-of¤up-quark,down-quark,down-quark| ======antineutron.z//antineutron.My/|electric-charge-of=0u|baryon-number-of=-1u|antiparticle-of=neutron| ======proton.z//p.tz/proton.My/|electric-charge-of=1u|baryon-number-of=1u| part-of¤up-quark,up-quark,down-quark| ======antiproton.z//antiproton.My/|electric-charge-of=-1u|baryon-number-of=-1u| antiparticle-of=proton|The level of indentation shows the hierarchy: a baryon is a type of hadron; a neutron, antineutron, proton, and antiproton are all a type of nucleon.
Words for concepts are defined at the same time concepts are defined. The leftmost word or phrase is also used as the name of the concept. So, hadron is a concept, and words for this concept are hadron in English, strongly interacting particle in English, and hadron in French.
If there is a conflict, you can define the concept name without defining a word:
==particle-hadron//hadron.z/
Single characters are used to represent features. For example, lower case z means English. Lower case y means French. The feature character A means adjective and B means adverb. (See the complete list of feature characters.) Lexical entries are assumed to be nouns unless otherwise specified.
In this notation inheritance is represented by indentation, but some concepts have several parents. The second element of each line in between the slashes can be used to indicate additional parents. For example, a nucleon is a fermion as well as an unflavored-baryon:
====unflavored#A baryon*.z//|strangeness-of=0u|charm-of=0u|bottomness-of=0u| =====nucleon.z/fermion/|spin-of=.5u|isospin-of=.5u|
So we can define words, phrases, and concepts all at the same time and show their inheritance. We can also easily add assertions. For example, we can define some assertions on a proton: its electric charge is 1, its baryon number is 1, and it consists of two up quarks and a down quark:
======proton.z//p.tz/proton.My/|electric-charge-of=1u|baryon-number-of=1u| part-of¤up-quark,up-quark,down-quark|
Single character feature codes are also used to represent the argument structure of verbs. For example, é indicates that a verb takes a direct object:
call.Véz/ V = verb é = takes direct object z = English
Each word in a phrase is followed by either a star or number sign, which indicates whether that word is inflected. We say I call up but he calls up:
call* ø up#R.VÀéz/ * = inflects # = does not inflect R = preposition ø = location of direct object V = verb À = Americanism
ø indicates the location of the direct object: we say call him up and not call up him.
Some verbs take prepositional phrase arguments, such as lie on as in I was lying on the couch. A plus sign after the preposition indicates it is required; an underscore indicates it is optional:
lie* on+.Vz/ lie* down#R on_.Vz/ + = required preposition _ = optional prepositionThere are a number of other single character codes for encoding verbs:
Verb features ú = subject assigned to slot 2 ü = subject assigned to slot 3 è = object assigned to slot 1 é = object assigned to slot 2 ë = object assigned to slot 3 ÷ = indicative "(that) he goes" O = subjunctive "(that) he go" Ï = infinitive "(for him) to go" ± = present participle "(him) going"
Elsewhere I describe in detail how to extend ThoughtTreasure for a sample application, information extraction for movie reviews.
Concluding from years of AI research that there is no single, "right" representation, I decided to use multiple representation schemes in ThoughtTreasure:
Settings such as the ground floor of a theater are represented in ThoughtTreasure as grids (a kind of ASCII art):
==Opera-Comique-Rez/floor/|col-distance-of=.3m|row-distance-of=.6m|level-of=0u| orientation-of=south|part-of=Opera-Comique|polity-of=2arr|has-ceiling| @19980201:naÐ|GS= wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww w wuu2uuuw wuu3uuuwuuuuuuw w wuuuuuuw wuuuuuuwuuuuuuw w wuuuuuuw wuuuuuuwuuuuuuw wCC wuuuuuuw wuuuuuuwuuuuCCw w wC w w Cw w w w wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww w p w w CC X CC w w p p p p p P p p ttt ttt p P p p ttt A ttt p P w w tttttttRTtttttttttttttttttttttttt w w w w ttttttttttttttttttttttttttttttttt w w w w B w w p p p P p p p P p p p P1 w w w w wwwwwwww w p 5Q w w w Q w wuuuuuuw wuuuuuuw c c c wuuuuuuw wCu4uuCwC CMMMMC CMMMMMMMMMC CMMMMC CwCuuuuCw wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww . p:door t:counter u:stairs-up w:wall A:employee-side-of-counter B:customer-side-of-counter C:column M:mirror P:revolving-door Q:stage-entrance R:&Opera-Comique-cash-register T:&Opera-Comique-ticket-box X.tpCw:&Opera-Comique-box-office 1:&Opera-Comique-Entrance1 2:&Opera-Comique-0To1A 3:&Opera-Comique-0To1B 4:&Opera-Comique-0To1Stage 5:&Opera-Comique-Stage-Entrance . |
This grid can be taken literally as a particular ground floor of a theater in Paris, or used to guess how any theater ground floor might be arranged.
The definition consists of the name of the grid and then several assertions about the grid: The distance between each successive grid element horizontally is .3 meters; the distance between each grid element vertically is .6 meters; this is the ground floor, level 0; its orientation is south; it is part of the Opera Comique building; it is located in the 2nd arrondissement of Paris; and it has a ceiling.
Then we define the state of this grid as of February 1, 1998. Each character is defined at the bottom: The character w indicates a new instance of an object of type wall. p defines an object of type door.
On the left are some doors leading into the middle lobby area. When this definition is read in, contiguous instances of the same letter are single objects. So the three ps in a row on the left constitute a single door. These next three ps constitute another door.
t indicates an object of type counter. In the middle we have a large contiguous ticket counter. Once these objects are defined, they can potentially be moved in the grid.
Something called a wormhole allows us to define a bunch of grids separately. We define the grid for the ground floor of this theater, and then we define the grid for the first floor of the theater, and then we connect those grids using wormholes.
At the top you will notice there is a 2. Down in the legend, 2 is defined as being a wormhole which connects the ground floor to the first floor. The first floor, which is defined by another grid, also contains that same wormhole. So if an actor wants to get from the ground floor to the first floor, they would go through that wormhole.
Grids make it easy to code new stereotypical settings. They also allow certain 2-D spatial operations to be performed efficiently. For example, a grid representation lends itself toward finding paths through a space containing many obstacles. Paths are used in ThoughtTreasure to make inferences about whether it's possible to walk from one location to another, or whether two people can hear each other.
ThoughtTreasure can use grids and wormholes to plan a trip such as one starting from a street in Paris, to New York:
Rules of thumb such as the following are represented procedurally in ThoughtTreasure:
Buy tickets at the box office. Look both ways before crossing the street. Turn the shower on before getting in. Don't wear the same clothes two days in a row. Don't leave home without your wallet. At the end of a phone conversation, say bye and hang up.
Let's take just the first example, Buy tickets at the box office, and show how it is represented in the following planning agent for purchasing a ticket P on behalf of actor A:
purchase-ticket(A, P) :- dress(A, purchase-ticket), RETRIEVE building-of(P, BLDG); near-reachable(A, BLDG), near-reachable(A, FINDO(box-office)), near-reachable(A, FINDO(customer-side-of-counter)), 2: interjection-of-greeting(A, B = FINDO(human NEAR employee-side-of-counter)), WAIT FOR may-I-help-you(B, A) OR WAIT 10 seconds AND GOTO 2, 5: request(A, B, P), 6: WAIT FOR I-am-sorry(B) AND GOTO 13 OR WAIT FOR describe(B, A, TKT = ticket) AND GOTO 8 OR WAIT 20 seconds AND GOTO 5, 8: WAIT FOR propose-transaction(B, A, TKT, PRC = currency), IF TKT and PRC are OK accept(A, B) AND GOTO 10 ELSE decline(A, B) AND GOTO 6, 10: pay-in-person(A, B, PRC), receive-from(A, B, TKT) ON FAILURE GOTO 13, ASSERT owner-of(TKT, A), post-sequence(A, B), SUCCESS, 13: post-sequence(A, B), FAILURE.
A planning agent is an arbitrary finite-state machine that can loop around, wait for various conditions, and so forth. This planning agent works as follows: First a subgoal is activated for A to get dressed for purchasing a ticket. That invokes another planning agent for dressing which causes A to select clothes that are appropriate to purchasing a ticket and then to get dressed. Once that subgoal is achieved then another subgoal is activated for A to be near the box office.
Of course there could be other plans for purchasing a ticket. This particular planning agent only knows about the plan of going to the box office to purchase the ticket.
A goes to the box office. Then A goes to the customer side of the counter in the box office. FINDO (find object) finds a nearby object of a particular type and returns it. That is, once you get to the box office, you can then locate the customer side of the counter.
Then state 2 of the finite-state machine is as follows: A says hello to the human who is located on the employee side of the counter. Then the planning agent waits until that person, called B, says may I help you, or any other lexical item defined for the may-I-help-you concept. Or it waits 10 seconds and then goes back to state 2. This planning agent will say hello every 10 seconds until it gets some service.
Then once B says may I help you, A requests the ticket and then the planning agent waits for an I-am-sorry response such as We're all sold out in which case this finite-state machine goes to state 13 and A then says goodbye to B and this subgoal fails.
Or, the planning agent waits for B to describe a particular ticket to A in which case it goes to state 8. The agent then waits for B to specify a price for the ticket. A then has to decide whether the date, time, and location of the ticket are acceptable. If they are acceptable, A accepts and the planning agent goes to state 10.
Then the subgoal for A to pay B the agreed upon amount in activated. That then activates a planning agent for paying, using various techniques such as cash, check, and credit card.
Then once it is paid, the agent activates the subgoal for A to receive the ticket from B. The receive planning agent involves extending a hand to receive the ticket from the other person and then grabbing it and putting it away. Then the fact that this ticket is now owned by A is asserted.
A says goodbye to B and then this planning agent terminates with success. So this subgoal would then succeed in that case. If the ticket is not received then the subgoal fails.
Corresponding to the planning agent of the customer is a planning agent for the employee working the box office:
work-box-office(B, F) :- dress(B, work-box-office), near-reachable(B, F), TKTBOX = FINDO(ticket-box); near-reachable(B, FINDO(employee-side-of-counter)), /* HANDLE NEXT CUSTOMER */ 100: WAIT FOR attend(A = human, B) OR pre-sequence(A = human, B), may-I-help-you(B, A), /* HANDLE NEXT REQUEST OF CUSTOMER */ 103: WAIT FOR request(A, B, R) AND GOTO 104 OR WAIT FOR post-sequence(A, B) AND GOTO 110, 104: IF R ISA tod { current-time-sentence(B, A) ON COMPLETION GOTO 103 } ELSE IF R ISA performance { GOTO 105 } ELSE { interjection-of-noncomprehension(B, A) ON COMPLETION GOTO 103 } 105: find next available ticket TKT in TKTBOX for R; IF none { I-am-sorry(B, A) ON COMPLETION GOTO 103 } ELSE { describe(B, A, TKT) ON COMPLETION GOTO 106 }, 106: propose-transaction(B, A, TKT, TKT.price), WAIT FOR accept(A, B) AND GOTO 108 OR WAIT FOR decline(A, B) AND GOTO 105 OR WAIT 10 seconds AND GOTO 105, 108: collect-payment(B, A, TKT.price, FINDO(cash-register)), 109: hand-to(B, A, TKT), 110: post-sequence(B, A) ON COMPLETION GOTO 100.
B dresses appropriately for the job and goes to the box office and goes to the employee side of the counter, waits for a customer A to look at B or say hello. Then B says may I help you and waits for a request and upon obtaining a request, tries to handle that request.
If the request is What time is it?, B will answer with the time and the planning agent will go back to state 103 to wait for another request.
If the request is a performance, then B will try to find the next available ticket for that performance. If there are none left, B will say I'm sorry, we don't have any. Otherwise B will describe the attributes of that ticket and its price, and either wait for A to accept or decline.
If A accepts, B collects the payment and hands A the ticket. If the customer declines, the planning agent goes back to state 105 to suggest another possible ticket. If the request is not understood, B says What? to A and another request is awaited. When A says goodbye, B says goodbye and the planning agent goes back to state 100 to wait for other customers.
Finite-state machines provide an efficient way of simulating the behavior of actors, which is useful in applications such as interactive fiction or controlling a robot.
The following planning agents are currently defined:
Declarative versions of planning agents called scripts are also defined.
Knowledge about the behavior of electronic devices such as:
When an idle phone line goes off hook, the phone gives a dialtone. When a ringing phone goes off hook, the phone stops ringing and is connected to the caller.is represented within the phone object planning agent:
H = FINDP(phone-handset, T) IF condition(T, W) and W < 0 { /* T broken */ ASSERT idle(T) } ELSE IF idle(T) { IF off-hook(H) ASSERT dialtone(T) } ELSE IF dialtone(T) { IF on-hook(H) ASSERT idle(T) ... } ELSE IF ringing(T, CLG = phone) { IF off-hook(H) { ASSERT voice-connection(CLG, T) ASSERT voice-connection(T, CLG) } ...
Human mental processes are also represented procedurally. If someone learns of the success of a friend, the emotion understanding agent will generate a positive emotion for that person. This agent also causes emotions to decay over time.
As we saw above, encyclopedic facts such as:
Washington is the capital of the U.S. Mushrooms have stems. Peas are green. 747s are made by Boeing.are represented as assertions:
=====747.¹®z//|product-of¤Boeing| ======747#®-400#®.®z//|travel-crew-of=2u| travel-passengers-of=412¯509u| travel-cargo-capacity-of=6030lbs| length-of=211ft|width-of=231.9ft|height-of=63.5ft| weight-of=870000lbs|travel-max-speed-of=630mph| travel-max-distance-of=8380mi| @198805|create¤Boeing|
Some linguistic knowledge is hardcoded, such as the fact that you refers to the listener of the conversation.
The fact that a question of the form:
Is NOUN ADJECTIVE?requires a response of yes or no is coded in the Yes-No question understanding agent.
The fact that we call gold hair blond hair is a selectional restriction, which is represented as an assertion:
==yellow#-orange*.NAz//jaune*-orange*.My/ |frequency-of=587.9ang| ===gold.NAz//couleur* d#R'or#NM.Fy/ ====blond.Az//blond.Ay/|r1=hair|
The fact that tonight refers to the night of the current day is represented as an assertion and used by the time text agent to parse temporal expressions, and the generator to generate temporal expressions:
==relative-day-and-part-of-the-day/temporal/ |unit1-of=day|unit2-of=tod| ===last#A night*.Éz// |min-value1-of=-1u|max-value1-of=-1u| min-value2-of=NUMBER:tod:2100| max-value2-of=NUMBER:tod:2400| ===tonight.Éz//ce#D soir#.ÉMy/ |min-value1-of=0u|max-value1-of=0u| min-value2-of=NUMBER:tod:2100| max-value2-of=NUMBER:tod:2400|
The fact that an X-phile is someone who likes X is represented as an assertion and used by ThoughtTreasure's word formation mechanisms to learn new words:
==phile.»z//|lhs-pos-of=Adjective| rhs-pos-of=Noun|rhs-class-of=human| [rhs-assertion-of phile [like rhs-obj lhs-obj]]|
ThoughtTreasure currently contains 152 English affixes associated with 38 derivational rules.
Getting a computer to understand natural language text is a very difficult problem. Why is this? An important reason is that natural language is highly ambiguous. Consider the text:
I amThis could be a pronoun followed by a form of the verb be, or the Roman numeral I followed by the abbreviation for before noon. Pierre could be a human name or the capital of South Dakota. Even short sentences can have hundreds of parses! A computer has no way of knowing which one is right.
Or does it? Couldn't one have the computer keep interpretations that make sense, and discard those that don't? That is in fact what ThoughtTreasure's understanding agents attempt to do.
Understanding agents decide whether a given input concept makes sense given the current context. Each understanding agent returns a number which indicates how much an input concept makes sense along with reasons why that concept makes or does not make sense.
Let's look at an example of how ThoughtTreasure's understanding agents work to understand a text involving emotions and interpersonal relations. First, we input the text Jacques is an enemy of François. The friend understanding agent accepts and stores that assertion:
> Jacques is an enemy of François. [enemy-of Francois Jacques]We then input that Jacques hates François. The friend understanding agent then returns that this makes sense because hating is consistent with being an enemy of, and that is generated in English as Right, he's an enemy of François:
> He hates François. [like-human Jacques Francois NUMBER:u:-0.55] ----UA_Friend---- makes sense because [enemy-of Francois Jacques] Right, he is an enemy of François.Then we input He uses tu with François. Now, this does not make sense to the friend understanding agent because in French enemies do not typically use tu with each other unless they are being very disrespectful. So ThoughtTreasure generates But I thought that he was an enemy of François. Still it is the only interpretation that ThoughtTreasure finds, so it is accepted:
> He uses tu with François. [tutoyer Jacques Francois] ----UA_Friend---- does not make sense because [enemy-of Francois Jacques] But I thought that he was an enemy of François.Then another input is entered, which is again stored by the friend agent:
> Lionel is an enemy of Jacques. [enemy-of Jacques Lionel]Then we input that Lionel uses tu with Jacques and this makes sense to the understanding agent because they both went to the École nationale d'administration, and people from that school use tu with each other:
> He uses tu with Jacques. [tutoyer Lionel Jacques] ----UA_Friend---- makes sense because [diploma-of Lionel promotion-Stendhal na ENA] [diploma-of Jacques promotion-Vauban na ENA] Right, Lionel holds a "promotion Stendhal" from the "École nationale d'administration" and he holds a "promotion Vauban" from the "École nationale d'administration".Then we input that Jacques succeeded at being elected President of France, which is stored:
> Jacques succeeded at being elected President of France. [succeeded-goal Jacques [President-of France Jacques]]Then the fact that Jacques is happy is input. Now, this makes sense to the emotion understanding agent because a positive emotion had already been generated earlier when that goal succeeded:
> He is happy. [happiness Jacques] ----UA_Emotion---- makes sense because [succeeded-goal Jacques [President-of France Jacques]] Right, he succeeds at being the president of France.Then Lionel is resentful toward Jacques is entered, which makes sense to the emotion understanding agent because a goal succeeded for Jacques and Jacques is an enemy of Lionel:
> Lionel is resentful toward Jacques. [resentment Lionel Jacques] ----UA_Emotion---- makes sense because [succeeded-goal Jacques [President-of France Jacques]] [enemy-of Jacques Lionel] Right, he succeeds at being the president of France and Lionel is his enemy.Then we input François is happy for Jacques and this makes sense to the extent that Jacques experienced a goal outcome, but does not make sense because Jacques is an enemy of François.
> François is happy for Jacques. [happy-for Francois Jacques] ----UA_Emotion---- makes sense because [succeeded-goal Jacques [President-of France Jacques]] does not make sense because [enemy-of Francois Jacques] True, he succeeds at being the president of France. But I thought that he was an enemy of François.
The emotion understanding agent makes reference to a number of emotion concepts, which are listed in the emotion ontology. There are 650 emotion words in French and English attached to 83 emotion concepts, organized into positive and negative emotions, amusement, contentment, ecstasy, enjoyment, gloating, gratitude, gratification, happiness, and so on.
The output of the semantic parser consists of basic assertions that are fairly close to the input text. Jacques is an enemy of François becomes enemy-of Jacques François. That's not really semantics.
If the computer could instead construct a simulation of what is going on in the input text and keep that simulation in sync with that input text, then the computer would be able to understand at a detailed level. It could easily answer questions by examining the state of the simulation.
We can think of the story understanding process as lining up cylinders of a combination lock. We can think of each planning agent as a cylinder that is in a given position. For example, the work-box-office planning agent discussed above could be in the state where it is handling a certain customer's request, say state 3. What the understanding agents try to do is to line up those planning agent cylinders given each new input:
PA1 PA2 PA3 +---+---+---+ | 8| 8| 8| | 9| 9| 9| state-| 0| 0| 0| | 1| 1| 1| | 2| 2| 2| +---+---+---+ | | UAs | V +---+---+---+ | 8| 1| 6| | 9| 2| 7| state-| 0| 3| 8| | 1| 4| 9| | 2| 5| 0| +---+---+---+
If we have a story that says The ticket seller gave a ticket to the customer then the work-box-office understanding agent would line up the work-box-office planning agent to that state. And of course other understanding agents such as purchase-ticket would line up their corresponding planning agents to their appropriate states.
So we have these planning agent cylinders and given every input, the understanding agents try to line them up.
Here is an example handled by ThoughtTreasure using this scheme. We input Jim Garnier was sleeping. Now, we have already entered into the database a lot of information about this character Jim Garnier: where he lives, what the apartment he lives in contains, and so forth. When this is input, the sleep understanding agent spins the sleep planning agent to the SLEEP state.
When we spin a planning agent to a state, we also have to perform some of the subgoals leading up to that state. So as part of the process of spinning the sleep planning agent to the SLEEP state, a subgoal for Jim to be in his bed is activated and achieved, so that Jim is then located in his bed in the grid:
> Jim Garnier was sleeping. UA_Sleep--PA_Sleep in SLEEP state --PA_Ptrans--Jim located in bed: &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& & wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww& & wwwwwwwwwww tttwsss ht Aswcccccccccw& & wwwwwwwwwwwwwwwwwww w tttwmss wcccccccccw& &wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwccccccccw w w smwmss wcc w& &w llllllll ss g Ecccc ssssssssswccccccccw w w smwmss wcv w& &wa llllllll ss s ssssssssswccccccccw wwww wwwww wwmss wcc w& &wA Jc sswccccccccw wwwww wwwwwwwwww wwww& &w cc sswccccccccw w& &wr sswwwwwwwwww w& &wr w ************************************** w& &wr **wwwwwwwwwwwwwwwwwwwwwwwwwttttttttttttl* lw& &w * ssw wttttttttttttt * tw& &w ** ssw wttttttttttttt * w& &w J w ** ssw wA s *JJJJJJJJw& &w w** Rsw w JJJJJJJJJw& &w s w sw wls JJJJJJJJJw& &w ttttttttt wwwwww wwww w s JJJJJJJJJw& &wr sttttttttts ss ssw w w s JJJJJJJJJw& &wr ttttttttt ss ssw w JJJJJJJJJw& &wr s ss Aw w JJJJJJJJJw& &w s ss w w E JJJJJJJJJw& &w tttt sssssssssss ss rrw w ttttttJt JJJJJJJJJw& &w A sssssssssss ss rrw w w rrrrr rrrrr Aw& &wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww& &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Then we enter He woke up. The sleep understanding agent then spins the sleep planning agent to the AWAKE state:
> He woke up. UA_Sleep--PA_Sleep in AWAKE state
Then we input that Jim poured shampoo on his hair. The shower understanding agent spins the shower planning agent to the READY-TO-LATHER state, in other words the state just after shampoo is poured on the hair. In the process of spinning that shower planning agent to the READY-TO-LATHER state, a number of subgoals first must be achieved: Jim must get up out of the bed, walk to the shower, and turn on the shower:
> He poured shampoo on his hair. UA_Shower--PA_Shower in READY TO LATHER state --PA_Ptrans--Jim located in shower: --PA_ShowerOn &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& & wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww& & wwwwwwwwwww tttwsss ht * Aswcccccccccw& & wwwwwwwwwwwwwwwwwww w tttwmss * wcccccccccw& &wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwccccccccw w w smwmss * wcc w& &w llllllll ss g Ecccc ssssssssswccccccccw w w smwmss * wcv w& &wa llllllll ss s ssssssssswccccccccw wwww wwwww wwmss * wcc w& &wA Jc sswccccccccw wwwww *wwwwwwwwww wwww& &w cc sswccccccccw *** w& &wr sswwwwwwwwww * w& &wr w * w& &wr wwwwwwwwwwwwwwwwwwwwwwwwwttttttttttttl* lw& &w ssw wttttttttttttt * tw& &w ssw wttttttttttttt * w& &w J w ssw wA s *JJJJJJJJw& &w w Rsw w JJJJJJJJJw& &w s w sw wls JJJJJJJJJw& &w ttttttttt wwwwww wwww w s JJJJJJJJJw& &wr sttttttttts ss ssw w w s JJJJJJJJJw& &wr ttttttttt ss ssw w JJJJJJJJJw& &wr s ss Aw w JJJJJJJJJw& &w s ss w w E JJJJJJJJJw& &w tttt sssssssssss ss rrw w ttttttJt JJJJJJJJJw& &w A sssssssssss ss rrw w w rrrrr rrrrr Aw& &wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww& &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&Now we can ask questions of ThoughtTreasure regarding the simulation up to this point:
> Jim lay down on his bed when? He lay down on his bed on Thursday January 15, 1998 at midnight. > Jim was asleep when? He was asleep between Thursday January 15, 1998 at midnight and Thursday January 15, 1998 at seven am. > Jim stood up when? He stood up on Thursday January 15, 1998 at seven am. He stood up on Thursday January 15, 1998 at midnight. > Jim was awake when? He was awake on Thursday January 15, 1998 at seven am. He was awake on Thursday January 15, 1998 at midnight.
Everything in the simulation is stored using specific timestamps. In fact, everything in the simulation is concrete. Arbitrary decisions are made in the simulation that are subject to later modification. Now, people have designed systems where only what is explicitly stated in the input is asserted and then inferences are made on demand in question answering. Here we are attempting another approach which is to generate the exact state of the world up to a certain level of granularity in advance of any question answering.
We can ask more questions:
> Jim was in his foyer when? He was in his foyer on Thursday January 15, 1998 at midnight. > Jim was in his bedroom when? He was in his bedroom between Thursday January 15, 1998 at midnight and Thursday January 15, 1998 at seven am. > Jim was in his bathroom when? He was in his bathroom on Thursday January 15, 1998 at seven am.In the simulation it turns out Jim was in the foyer before he went to sleep. That is where he was placed by default because the simulation did not know where he was initially. Then ThoughtTreasure assumed he woke up at some normal time, assumed arbitrarily to be 7 a.m.
That is the basic approach to story understanding in ThoughtTreasure. Of course, coding the understanding agents and the planning agents is a very time-consuming process because there are so many natural language constructions which correspond to so many different configurations of the simulation.
But I do not really see any way around it. What we have to do is look at every natural language construction and figure out how it maps to simulations. Just for the sleep agents, we have:
Mary is sleeping. Mary is lying awake in her bed. Mary was lying asleep in her bed. Mary was asleep and Peter did not want to wake her. At ten in the morning, Mary was still asleep. Mary had only slept a few hours. ...