Herein are notes, plans, and rationales for an Interactive Fiction ("IF") engine I am developing.
This is a preliminary rough draft, liable to change.
Unfinished and ill-formed portions are show in gray.
This document in part expresses my feelings. Try not to take it personally, and I will try not to take critiques too personally.
None. This is my hobby project. It is a very hard project and I have other hard stuff in my life, so I don't manage to work on it very steadilly.
SHRDLU was an impressive piece of software, and yet in the decades since, it seems virtually nothing has been done to improve upon it! Come on, people, conversing with an artificial intelligence! Don't you want that? Isn't that utterly fascinating? [Talk more about how SHRDLU worked?]
OK, so text adventures came out after SHRDLU, but they were brutally dumb in comparison -- and they still haven't caught up. Infocom upgraded the parsers to handle compound objects ("get the lamp and the sword", "get all but the babelfish"), and that's about where they've stayed. You can still only ask about two questions, "Who am I?" and "What is a grue?", and many games use the "ask/consult character/book about subjectname" syntax. Little attempt is made to fully simulate such dynamic objects as rope, matches, or dirt. NPCs are sometimes beautifully painted, but are pieces of cardboard controlled by state machines with only a few states and typically have very narrow uses within a game.
[This paragraph should get more research] The languages I've looked at that were designed for creating interactive fiction are also a mixed lot: many betray a cramped heritage from the 80s (or 70s), or weird me out in an attempt to be useable by non-programmers. In particular, they're all too imperative, with the exception of Inform 7 with which I have other dissatisfactions. For being so object-oriented, they're an oddly muddy lot, and, by the way, none seem very good at dynamic creation of objects.
Oh the wonder! [incomplete]
So what do I want?
What I want is to create an engine that distinctly improves the state of the art of interactive fiction.
Or to create a SHRDLU that never was. To create an AI.
How to satisfy my four objectives?
It is the job of the parser to take potentially ambiguous input and resolve its meaning into an unambiguous command, question, or declaration. Chart parsers can do a reasonably efficient job of finding all possible valid sentences without wasting too much time chasing down interpretations that turn out to be invalid.
The current game situation should be consulted to help resolve ambiguity. In particular, only things that have been reported to the player up to that point in the game should be relevant, and for the most part only things in the immediate vicinity of the player. The simulation should not be consulted directly, though: only the world as it has been experienced by (reported to) the player should be known. Just as an NPC will need to remember its own history in the game, the player character should remember what has gone on before and that knowledge used to understand what the player is commanding/querying/declaring.
> where is the lamp? You last saw it in the cellar.
More to say here.
Most knowledge about the game needs to be built into the game, but when the player says something otherwise not understood, it could be useful to consult WordNet to provide better responses to the player, such as by suggesting synonyms or even just knowing what kind of sentence the user typed.
Planning algorithms can enable NPCs to solve problems. Their main drawback is that they can only understand the world essentially in terms of logic: they need to be able to chain together declarative causes and effects. This may have a dramatic effect on how the simulation engine is designed.
Provide NPCs with an ability to perceive the simulation that is more on a par with what the player gets. Provide them with memory of experience.
It should be possible to run a NPC separately from the simulation, with little or no more priviledged knowledge of what is going on than a player would get.
Even if planning algorithms turn out to be kind of useless (See "Elephants Don't Play Chess"), all other AI is still going to depend critically on a good ability to perceive the environment, which still impacts the design of the simulation engine.
More to say here, I think.
Modeling the objects in an interactive fiction simulation is I think suprisingly hard. A player may try any action on any object, and making all those combinations respond well is challenging. Existing interactive fiction systems seem to me fractured and/or over-complicated in how they do this.
For a long time I could really only think "the behavior of objects should be determined by their properties, not by their class". What I now understand is that the flexible behavior of objects in an interactive fiction simulation is in fact poorly modeled by traditional object oriented programming (OOP). In traditional OOP, the behavior (methods) of an object are determined by examining the class of the object, which itself can inherit behavior from other classes.
Determining behavior via a static heirarchy of classes isn't at all flexible or concise. Using a programming language's features to dynamically mutate classes or their inheritances improves on that, but is generally not good either, in part because in popular langages those features are murky, if available at all. My commom-sense reasoning is still telling me that "this glass bowl behaves as it does, not because it is an instance of a platonic-like class 'bowl', or classes 'bowl' and 'glass', but rather becuase it has the properties of bowl-ness and glass." I think simulation objects should be or contain a collection of properties that provide behavior in a more dynamic, less rigid fashion than traditional classes, perhaps using prototype-based inheritance for efficient encoding of attributes and behavior. An object's gross function could be dynamically changed by altering its collection of properties on the fly.
But also, in traditional OOP the decision of which action to perform is too simple: the first handler found for an action is used. That handler may choose to invoke other handlers of the same action, but other handlers are strictly at the mercy of whichever one is first in line. And so a typical IF engine will engage in a kind of dance around each action: "may" the action be performed, "can" the action be performed, the action is about to be performed, the action is being performed, the action has been performed. The purpose of these stages is not to closely model the "doing" (or "doing-ness" or "being-ness" or potential or indeterminacy) of an action, but rather to provide plenty of cracks into which exceptions can be wedged at various stages of processing. It is an ad-hoc way of deciding what to do and then doing it. I think what this calls for is a new way of interrogating all potential actions and integrating their responses and effects. Every aspect of every component of the simulation that cares to could register an opinion as to what effect an action should have, and their responses could be averaged, prioritized, or reasoned about, or whatever. The various possible effects should be encapsulated in the reponses in such a way as to defer their actual effectiveness until such time as the decision had been finalized. I think this can all be achieved by taking a more declarative programming approach.
Perhaps something like:
turn_on(darkroom, Obj) ⊢ can_see(player, Obj) → set_attr(Obj, enabled) turn_on(player, Obj) ⊢ can_see(player, Obj), hands_free(player) → set_attr(Obj, enabled) turn_on(X, lamp) ⊢ installed(batteries) → set_attr(lamp, illuminating)
See: Declarative Programming , Multiple dispatch , Polymorphism in object-oriented programming
I'm not sure how well object-like representations can mix with an inference engine.
When there is a relationship between two objects, which one should manage the relationship? Since the predominant relationship in interactive fiction simulations is that of containment (objects inside rooms, and objects inside objects), it would seem natural for the container to be in charge of relationships to the contained objects, and that is what many text adventures do. But what about other relationships, such as part-of or oriented-toward or positioned-near-to? Each different kind of relationship raises questions of management anew. The solution is to make the management (and storage) of relationships external to all objects. Essentiallly this is a list of 3-tuples:
[ (lamp, in, box) , (box, on, shelf) , (shelf, in, pantry) , (table, in, pantry) , (player, on, table) , (shovel, held_by, player) , (pantry, south, kitchen) , (kitchen, north, pantry) , (compass, points_to, magnet) ]
Relationships should have their own heirarchy of meaning: "held_by" is a kind of "in", and "in" and "on" are kinds of dependencies, "south" and "north" are kinds of exit relationships. "points_to" would be a kind of orientation relationship.
Actions should be implemented outside of simulation objects. They should inherit from each other: throwing is a kind of dropping is a kind of object manipulation.
Much more to say here.
An adventure game engine should include a physics engine: a standard system for handling the "physical" interaction of objects, with generic ways to hook into and modify the behaviors of specific objects or sets of objects. Adventure game physics engines I have looked at have an incompletely-factored bunch of rules and hooks for doing this. I want to completely factor this, so that it becomes easier to customize.
A case study could be "taking a lamp" simulated as the locating of the lamp, reaching for the lamp, and carrying the lamp back to inventory:
A room contains a table and a shelf. The agent is standing on the table. The lamp is in a box on the shelf. "take lamp" agent.parse-and-execute("take lamp") this is succesfully translated to the logical command: agent.take(lamp) what lamp? which lamp? ** FIXME ** can agent take lamp? can agent locate lamp? can agent can see lamp or find-by-feel lamp? can agent see lamp? can light (a photon-like semi- or pseudo-object) get from lamp to the agent? is the lamp self-illuminated? (NO) is there light falling on the lamp? does the box permit the interior traversal of light? for each other object in the box: does this object provide light? does this object directly provide light? does this object contain other objects? does this object permit light to traverse out from contained objects? recurse does the box permit light to traverse in from without? for each other object on the shelf: does this object provide light? ... does the shelf permit light to traverse in from without? for each other object in the room: does this object provide light? ... does the room itself provide light to its interior? (YES) does the room permit interior traversal of light? does the shelf permit light to traverse in from without? does the shelf permit interior traversal of light? does the box permit light to traverse in from without? does the box permit interior traversal of light? can the lamp be seen? does the lamp contrast with its surroundings? does the lamp have substance? does the lamp reflect light in the direction of the agent? does the box permit interior traversal of light? does the box permit light to traverse out from within? does the shelf permit interior traversal of light? does the shelf permit light to traverse out from within? does the room permit interior traversal of light? does the table permit light to traverse in from without? does the table permit interior traversal of light? can agent see the light? is agent blindfolded? is agent blind? can agent reach lamp? can agent-hand (a pseudo- or semi-object) get from agent to the lamp? does agent have agent-hand? can agent-hand move? can agent-hand transit interior of table? can agent-hand transit off of table? can agent-hand transit interior of room? can agent-hand transit into shelf? can agent-hand transit into box? is box open? can agent-hand grasp lamp? can agent-hand grasp? is agent-hand full? is lamp graspable? does lamp have substance? is lamp too slippery? [ lamp is moved to agent-hand ] can agent-hand carry lamp back to agent? is lamp carryable by agent-hand? is lamp carryable? is lamp moveable by agent-hand? is lamp moveable? is lamp fixed to anything? does box permit agent-hand to carry out lamp? does box permit agent-hand to traverse out from within? does box permit lamp to traverse out from within? does shelf permit agent-hand to carry out lamp? does room permit agent-hand to carry lamp within? does room permit agent-hand to traverse interior? does room permit lamp to traverse interior? does room permit agent-hand-with-lamp to traverse interior? does table permit agent-hand to carry in lamp? does agent permit agent-hand to carry in lamp? does inventory permit containment of lamp? does inventory have room for lamp? [lamp is moved from agent-hand to inventory]
Perhaps the biggest problem is that procedural code is opaque to planners and thus to NPCs. Given a procedure that implements a particular verb or object, it is impractical to automatically extract the preconditions and postconditions needed by a planner. Preconditions and postconditions could be separately coded for the planner, but they would inevitably differ from actual procedural code in the simulation, and in any event would be a duplication of effort. Instead I want to, as far as possible, represent verbs, objects, relationships, physics, and events in a declarative manner akin to what can be done in Prolog.
NPCs should, in priciple, be able to perceive and do anything a player might. All events that happen around them, whether as a result of their own actions or not, should be sent to an NPC in a form that can be understdood by the NPC. Some form of logical description would be appropriate. Events also may be perceived by the player, and need to be transmitted to the console in human language (possibly with text mark-up). The processing of actions and their results should be the same, regardless of what agent may be instigating them. Action results should be events broadcast to all nearby agents. This is also more like how the real world operates: most observable effects can be observed by everyone who happens to be nearby. An event (or percept) should be a set of encodings, one in logic and zero or more in various human languages, of the facts of something that happened. Some or many percpets with logical-only encodings could be translated automatically to English. The logical-description portion of an event would typically include the following information:
Scope: How widely should this event be broadcast?
What: What changed, in what way?
Who: Who caused the event, if anyone?
While each human language portion of an event would be a flexible combination of static text, display mark-up, dynamically substituted/expanded/generated text, and grammatical mark-up. Grammatical mark-up would be for describing the same event in both second and third person, which requires alternate grammatical subjects and gender, mainly.
For example, if an agent digs a hole in the garden, the resulting percept might look something like this:
[ LOGIC: created(who=Agent, where=in(Garden), what=Hole) , ENGLISH: capitalize(subject(Agent)) person(dig,digs) "a" name(Hole) "in the" Where "." ]
Which for a human player digging the hole would output "You dig a shallow hole in the garden.", but for a human player that just happened to be in the garden when Floyd digs a hole would output "Floyd digs a shallow hole in the garden."
Each step in the earlier "take lamp" example would generate an event/percept/result for success or failure. Outer clauses would transform or discard the results from inner clauses, effectively filtering out excessively detailed results.
I anticipate the most performance demanding components of this system will be:
I don't really believe in "programming languges for non-programmers". If you want to create a world — a working world powered by a machine in the real world, not just powered by the imagination of the reader — you should expect to do some engineering. If you don't know how to do the engineering then you should expect to engage in a dialog with an engineer (who will do it for you), which could be an AI that is providing constant feedback and asking guiding questions, but that is not a programming language.
What the engineer should want is a language that provides power and correctness. This may mean learning new and conceptually challenging ideas (like calculus, logic, recursion, or the dubiously-named "monad" of Haskell fame). This should not mean ever more complicated syntax and semantics, English-looking or not. A powerful language should have composability most of all (often provided by a variety of "scoping" mechanisms, as well as orthogonality of language features), as well as conciseness and a good standard library and tools. To ensure correctness, a language should be unambiguous and enable static checking. Power and correctness both imply readability. Since we do not know how to write software in pure math, then for me these are best satisfied at the current time by Haskell.
If I ever get this system working well, adding an authoring system which provides a kind of dialog-with-an-engineer could be a very worthwhile follow-up project.
Users of Inform and TADS are used to being able to compile their games into data files that can then be read/executed by a standard un-customized interpreter. In the software industry in general, however, it is totally common for programmers to compile a possibly customized framework/engine/library into their application and deliver a complete, self-standing executable.
As far as reasonably possible, I would like to separate the "unchanging" parts of my system from the specifics of any given game. Presumably much of it could be appropriately factored into libraries. Anything that is stored in or runs inside an interpreter should be easy to store outside the executable.
My goal is not to create a new language, but if that's where I end up, then presumably my program becomes the interpreter and all game-specific code gets stored separately.
Don't forget that games need to be saved in-progress and later restored. As long as the entire game state (between turns) is accessible to a serializer, then it's doable. State that is hidden inside continuations or other functions will be hard or impossible to save and restore. Do not store Haskell functions in the game state without some means of solving this!
I intend to publish software under the GPL v3 or LGPL eventually — either when I'm no longer eager to do it all myself, or if it gains a significant following, whichever comes first.
|Task Environment||Interactive Fiction||Multi-User Dungeon|
|Number of Agents?||Multiple (Player + NPCs)||Multiple|
|Deterministic state?||Mostly Determined||Mostly Determined|
|Episodic or Sequential?||Sequential||Sequential|
|Static?||Static||Dynamic (though relatively slow)|
|Discrete?||Mostly Discrete||Fairly Discrete|
|Are Rules Known?||Known||Known|
Or, Stuff That Went Into My Head That Perhaps Should Go Into Yours Too.
Within each subject, In approximate order of my reading them.