ThoughtTreasure Knowledge Base (TTKB) version 0.00022

by Erik T. Mueller

November 30, 1999


ThoughtTreasure Knowledge Base (TTKB) is a dump of the ThoughtTreasure lexicon and knowledge base in a format that is easier for other programs to read and use than the ThoughtTreasure database files. TTKB can be accessed:
  1. from the web using the python/query.cgi CGI script,
  2. using the python/ttkb command,
  3. from Python using the routines provided in python/ttkb.py, and
  4. from other languages by reading TTKB.

Human coders add to the ThoughtTreasure database files in the db subdirectory of the ThoughtTreasure distribution. The runtime/grind.sh script converts the ThoughtTreasure database files into TTKB, which resides in the ttkb subdirectory. This directory contains three files:

Inflection file

Sample lines of infl.txt are:

dog_collars /PNz/ dog_collar-Nz
rêvassassions /iP1JVy/ rêvasser-Vy
Each line of infl.txt is of the form:

word_or_phrase infl_features le_uid

word_or_phrase
Inflection (word or phrase) with spaces mapped to underscores ("_").
infl_features
ThoughtTreasure features associated with the inflection, preceded and followed by a slash ("/"). For example, features provide number, part of speech, and language.
le_uid
Unique identifier of the associated lexical entry. The word_or_phrase is one possible inflection for this lexical entry.
The infl.txt file is sorted by word_or_phrase so programs can use binary search to look up inflections.

Lexical entry unique identifiers

A lexical entry unique identifier (le_uid) consists of:

Sample le_uids are:

dormir-Vy
knowledge_acquisition-Nz

Lexical entry file

Sample lines of le.txt are:

create-Vz /Vz/ ·· create /¹/ 1:subj::::0 2:obj::::0
director-Nz /Nz¸/ ·· Director /¹É/ TV-director // film-director // director /¹/
don_t_mention_it-xz /xz/ ·'·_·_·· interjection-of-response-to-thanks //
hammer-Vz /Vz/ ·· hammer-into // 1:subj::::0 2:obj::::0 3:iobj:into-Rz:::0
kick-Vz /Vz/ ·· died /T/ 1:subj::::0 :expl:the_bucket-0z::V_O:0
knowledge_acquisition-Nz /Nz/ ·· knowledge-acquisition /¹/
Each line of le.txt is of the form:

le_uid le_features phrase_separators [leo ...]

le_uid
Lexical entry unique identifier as described in the previous section.
le_features
ThoughtTreasure features of the lexical entry, preceded and followed by a slash ("/").
phrase_separators
A phrase such as don't mention it contains three phrase separators: single quote ("'"), space (" "), and space (" "). This field contains the phrase_separators delimited by "·" and with spaces mapped to underscores "_".
[leo ...]
Zero or more lexical entry to object connections (leos).

The le.txt file is sorted by le_uid.

Lexical entry to object connections

Sample leos are:

create /¹/ 1:subj::::0 2:obj::::0
Director /¹É/
hammer-into // 1:subj::::0 2:obj::::0 3:iobj:into-Rz:::0

A leo consists of:

objname leo_features [theta_role ...]

objname
Name of the object associated with the lexical entry. That is, one of the lexical entry's meanings.
leo_features
ThoughtTreasure features associated with this particular lexical entry-object connection, preceded and followed by a slash ("/").
[theta_role ...]
Zero or more theta roles.

Theta roles

Sample theta roles are:

1:subj::::0
2:obj::::0 
3:iobj:into-Rz:::0

A theta role consists of colon-separated items:

slot_number:case:le_uid:subcat:position:optional_flag

slot_number
Slot number in the assertion of this theta role (empty for expl case).
case
One of the following:
le_uid
Lexical entry unique identifier. The preposition for iobj or the expletive phrase for expl.
subcat
Subcategorization feature. One of the following:
position
Position of the theta role. One of the following:
optional_flag
0 if the theta role is required; 1 if the theta role is optional.

Object/assertion file

Sample lines of obj.txt are:

create-dig creuser-Vy dig-Vz [ako create-dig create-class]
fall-asleep endormir-Vy go-Vz fall-Vz [ako fall-asleep personal-script] [leadto1 fall-asleep asleep] 
farmland terre-FNy land-Nz farmland-Nz [ako farmland landmass]
Each line of obj.txt is of the form:

objname [le_uid ...] [assertion ...]

objname
Name of the object.
[le_uid ...]
Zero or more lexical entry unique identifiers. That is, words or phrases for the concept in English and/or French.
[assertion ...]
Zero or more assertions about the object.

The obj.txt file is sorted by objname.

Assertions

Sample assertions are:

[ako fall-asleep personal-script]
[duration-of cafe-stay NUMBER:second:7200]
[event01-of collect-card-payment [hand-to buyer seller credit-card]]
[role01-of take-bus bus-rider]
@19890101T000000:19890101T000001|[born Patapouf Paris] 
ThoughtTreasure assertions are defined by the following Extended Backus-Naur Form (EBNF) grammar:
Assertion      ::= (TimestampRange '|')? '[' Concept (' ' Concept)* ']'
Concept        ::= Assertion | Token
TimestampRange ::= '@' Timestamp ':' Timestamp
Timestamp      ::= 'na' | '-Inf' | '+Inf' | 'Inf' | ISO8601Subset
ISO8601Subset  ::= Y Y Y Y M M D D 'T' H H M M S S 'Z'
                   | Y Y Y Y M M D D 'T' H H M M S S
                   | Y Y Y Y M M D D 'T' H H M M S S '-' H H M M
                   | Y Y Y Y M M D D 'T' H H M M S S '+' H H M M
                   | Y Y Y Y M M D D
                   | Y Y Y Y M M
                   | Y Y Y Y
Y              ::= Digit
M              ::= Digit
D              ::= Digit
H              ::= Digit
M              ::= Digit
S              ::= Digit
Token          ::= ObjName | String | Number | TimestampRange | Name
ObjName        ::= ObjNameChar+
ObjNameChar    ::= Digit | Letter | '-' | '?'
String         ::= ('STRING:' ObjName ':')? '"' StringChar* '"'
Number         ::= 'NUMBER:' ObjName ':' Double
Name           ::= 'NAME:' '"' StringChar* '"'
Digit          ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
StringChar is any character except "
Letter is a-z or A-Z (ISO 8859-1 letters are not included)
Double is the output of C printf %g

Implementing parsers

When implementing parsers using the lexicon, one must be careful to take into account constraints associated with the links from lexical entries to concepts. Concepts should only be activated if all of the hard constraints are satisfied. Concepts should be activated to the extent that soft constraints are satisfied. The hard constraints are:

The soft constraints are:
ThoughtTreasure documentation | ThoughtTreasure home

Questions or comments? webmaster@signiform.com
Copyright © 2000 Signiform. All Rights Reserved. Terms of use.