ThoughtTreasure Knowledge Base (TTKB) version 0.00022
by Erik T. Mueller
November 30, 1999
ThoughtTreasure Knowledge Base (TTKB) is a dump of the ThoughtTreasure
lexicon and knowledge base in a format that is easier for other programs
to read and use than the ThoughtTreasure database files.
TTKB can be accessed:
-
from the web using the
python/query.cgi CGI script,
-
using the python/ttkb command,
-
from Python using the routines provided in python/ttkb.py, and
-
from other languages by reading TTKB.
Human coders add to the ThoughtTreasure database files in the db
subdirectory of the ThoughtTreasure distribution. The runtime/grind.sh
script converts the ThoughtTreasure database files into TTKB, which
resides in the ttkb subdirectory.
This directory contains three files:
- infl.txt - inflection file
- le.txt - lexical entry file
- obj.txt - object/assertion file
Inflection file
Sample lines of infl.txt are:
dog_collars /PNz/ dog_collar-Nz
rêvassassions /iP1JVy/ rêvasser-Vy
Each line of infl.txt is of the form:
word_or_phrase infl_features le_uid
- word_or_phrase
- Inflection (word or phrase) with spaces mapped to underscores ("_").
- infl_features
- ThoughtTreasure features associated with
the inflection, preceded and followed by a slash ("/"). For example,
features provide number, part of speech, and language.
- le_uid
- Unique identifier of the associated lexical entry. The
word_or_phrase is one possible inflection for this lexical entry.
The infl.txt file is sorted by word_or_phrase so
programs can use binary search to look up inflections.
Lexical entry unique identifiers
A lexical entry unique identifier (le_uid) consists of:
- the citation form of the lexical entry with spaces mapped to
underscores ("_"), followed by
-
a dash ("-"), followed by
-
the feature character for the gender of the lexical entry, where applicable,
followed by
-
the feature character for the part of speech of the lexical entry, followed by
-
the feature character for the language of the lexical entry.
Sample le_uids are:
dormir-Vy
knowledge_acquisition-Nz
Lexical entry file
Sample lines of le.txt are:
create-Vz /Vz/ ·· create /¹/ 1:subj::::0 2:obj::::0
director-Nz /Nz¸/ ·· Director /¹É/ TV-director // film-director // director /¹/
don_t_mention_it-xz /xz/ ·'·_·_·· interjection-of-response-to-thanks //
hammer-Vz /Vz/ ·· hammer-into // 1:subj::::0 2:obj::::0 3:iobj:into-Rz:::0
kick-Vz /Vz/ ·· died /T/ 1:subj::::0 :expl:the_bucket-0z::V_O:0
knowledge_acquisition-Nz /Nz/ ·· knowledge-acquisition /¹/
Each line of le.txt is of the form:
le_uid le_features phrase_separators [leo ...]
- le_uid
- Lexical entry unique identifier as described in the previous
section.
- le_features
- ThoughtTreasure features of the
lexical entry, preceded and followed by a slash ("/").
- phrase_separators
- A phrase such as don't mention it contains three phrase
separators: single quote ("'"), space (" "), and space (" "). This
field contains the phrase_separators delimited by "·" and with spaces
mapped to underscores "_".
- [leo ...]
- Zero or more lexical entry to object connections (leos).
The le.txt file is sorted by le_uid.
Lexical entry to object connections
Sample leos are:
create /¹/ 1:subj::::0 2:obj::::0
Director /¹É/
hammer-into // 1:subj::::0 2:obj::::0 3:iobj:into-Rz:::0
A leo consists of:
objname leo_features [theta_role ...]
- objname
- Name of the object associated with the lexical entry. That is,
one of the lexical entry's meanings.
- leo_features
- ThoughtTreasure features associated
with this particular lexical entry-object connection, preceded and followed
by a slash ("/").
- [theta_role ...]
- Zero or more theta roles.
Theta roles
Sample theta roles are:
1:subj::::0
2:obj::::0
3:iobj:into-Rz:::0
A theta role consists of colon-separated items:
slot_number:case:le_uid:subcat:position:optional_flag
- slot_number
- Slot number in the assertion of this theta role (empty for expl
case).
- case
- One of the following:
- subj - subject
- obj - object
- iobj - indirect object
- aobj - adjective object
- kobj1 - first object of conjunction
- kobj2 - second object of conjunction
- expl - expletive. This is a word or phrase which must be present
but does not appear in the result concept. An example is kick the
bucket: the concept for bucket does not appear in the result
concept.
- le_uid
- Lexical entry unique identifier. The preposition for iobj or the
expletive phrase for expl.
- subcat
- Subcategorization feature. One of the following:
- O - subjunctive
- ÷ - indicative
- Ï - infinitive
- ± - present participle
- position
- Position of the theta role. One of the following:
- _V - before the verb
- V_O - after the verb and before the object
- VO_ - after the object
- optional_flag
- 0 if the theta role is required; 1 if the theta role is optional.
Object/assertion file
Sample lines of obj.txt are:
create-dig creuser-Vy dig-Vz [ako create-dig create-class]
fall-asleep endormir-Vy go-Vz fall-Vz [ako fall-asleep personal-script] [leadto1 fall-asleep asleep]
farmland terre-FNy land-Nz farmland-Nz [ako farmland landmass]
Each line of obj.txt is of the form:
objname [le_uid ...] [assertion ...]
- objname
- Name of the object.
- [le_uid ...]
- Zero or more lexical entry unique identifiers. That is, words or phrases for
the concept in English and/or French.
- [assertion ...]
- Zero or more assertions about the object.
The obj.txt file is sorted by objname.
Assertions
Sample assertions are:
[ako fall-asleep personal-script]
[duration-of cafe-stay NUMBER:second:7200]
[event01-of collect-card-payment [hand-to buyer seller credit-card]]
[role01-of take-bus bus-rider]
@19890101T000000:19890101T000001|[born Patapouf Paris]
ThoughtTreasure assertions are defined by the following Extended Backus-Naur
Form (EBNF) grammar:
Assertion ::= (TimestampRange '|')? '[' Concept (' ' Concept)* ']'
Concept ::= Assertion | Token
TimestampRange ::= '@' Timestamp ':' Timestamp
Timestamp ::= 'na' | '-Inf' | '+Inf' | 'Inf' | ISO8601Subset
ISO8601Subset ::= Y Y Y Y M M D D 'T' H H M M S S 'Z'
| Y Y Y Y M M D D 'T' H H M M S S
| Y Y Y Y M M D D 'T' H H M M S S '-' H H M M
| Y Y Y Y M M D D 'T' H H M M S S '+' H H M M
| Y Y Y Y M M D D
| Y Y Y Y M M
| Y Y Y Y
Y ::= Digit
M ::= Digit
D ::= Digit
H ::= Digit
M ::= Digit
S ::= Digit
Token ::= ObjName | String | Number | TimestampRange | Name
ObjName ::= ObjNameChar+
ObjNameChar ::= Digit | Letter | '-' | '?'
String ::= ('STRING:' ObjName ':')? '"' StringChar* '"'
Number ::= 'NUMBER:' ObjName ':' Double
Name ::= 'NAME:' '"' StringChar* '"'
Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
StringChar is any character except "
Letter is a-z or A-Z (ISO 8859-1 letters are not included)
Double is the output of C printf %g
Implementing parsers
When implementing parsers using the lexicon, one must be careful to
take into account constraints associated with the links from lexical
entries to concepts. Concepts should only be activated if all of the
hard constraints are satisfied. Concepts should be activated to the
extent that soft constraints are satisfied. The hard constraints are:
-
Expletive constraints: An additional word or phrase must be present
at a certain location. For example, the link from the lexical entry
kick to the concept die requires the presence of
the bucket after the verb.
-
Object constraints: A direct object must be present. For example,
the link from the lexical entry hire to the concept hire
requires the presence of a direct object.
-
Indirect object constraints: An indirect object must be present and
preceded by a particular preposition. For example, the link from the
lexical entry fly to the concept fly-into requires
the presence of an indirect object preceded by the preposition into.
The soft constraints are:
-
Selectional constraints: Certain argument types are preferred. For
example, the first argument of meow is typically a cat.
-
Filters: Certain syntactic environments are preferred. For example,
Internet prefers a definite article (I connected to the
Internet is preferred over I connected to Internet), and
to die for prefers to be postposed (It's a chocolate sundae
to die for is preferred over It's a to die for chocolate
sundae).
ThoughtTreasure documentation |
ThoughtTreasure home
Questions or comments?
webmaster@signiform.com
Copyright © 2000 Signiform.
All Rights Reserved. Terms of use.