Introduction to Prolog

PROLOG uses the concept of verifying the truth of a query against a specific data base as a programming language. The data base contains facts and rules which the PROLOG system can use to deduce other true statements in a systematic fashion. The triggering query can contain variables (in the sense of mathematical logic). PROLOG then finds values for these variables which allow the truth of the fully-specified query to be deduced from the data base. The final value of the "top-level" variables are reported on the output.

A fact is a written statement like "father(john, bill)." Here, "john" and "bill" are individual constants (as opposed to variables -- variable begin with either "_" or a capital letter). "father" is the name of a predicate -- a function from constants to {true, false} which in this case is true ONLY for the specific constants mentioned. father(x, y) is false (actually, not deducible). Facts and rules are presented to PROLOG in a special file, which can be "consulted" by the program. Queries look like facts, but are presented on standard input. So if the above fact was in the data base, the query ?- father(john, bill). would get the PROLOG response "Yes". (?- is the PROLOG prompt for queries.) Queries can contain variables so ?- father(john, A). would get the response "Yes. A = bill." If there is more than one way the query can be satisfied, PROLOG can report all such ways. If you follow PROLOG's response with ";", PROLOG tries to find another satisfying substitution, reporting it if one is found, and otherwise responding "No".

A rule takes the form predicate(args) :- query-1, query-2, … query-n. This rule is satisfied if a substitution for each variable in the rule can be found which satisfies all the queries on the rule's RHS. This "proves" the predicate on the LHS. When a query is given, PROLOG matches the query against a rule head with the same predicate name and number of arguments as the query. The match may give values to some variables in the rule, or in the query (query variable names are considered never to match rule names). Each query in the rule RHS is then tried in order, left to right, finding new settings for more rule variables, or reporting that the query cannot be satisfied. If some RHS query cannot be satisfied, the next rule for this predicate name is tried. This defines a form of exhaustive trial-and-error search called backtrack programming. Once the last RHS query is successfully satisfied, the original query is reported satisfied, along with satisfying values for any variables appearing in that query.

Example:

mother(john, mary). father(sue, jim). father(jim, john). father(mary, tom).

grandfather(X,Y):- father(X,F), father(F,Y).

grandfather(X,Y):- mother(X,M), father(M,Y).

?- grandfather(john, X).

?- grandfather(X, john).

PROLOG defines lists, using the notation [1, 2, 3]. A rule can determine the head and tail of a list using the pattern [Head | Tail] as an argument. So we can define

car([C| Tail], C).

?- car([1,2,3],A). Will get the response "Yes. A=1". The matching of the query to the data base assigns C=1, and then C=A. It assigns [2,3] to Tail, and then discards that result.

A null list is written [], and does not match the [Head | Tail] form.

We can define append:

/* append(A, B, C) is true when C is list A with list B appended to it. */

append([],B,B).

append([H|T], B, [H|C]) :- append(T,B,C).

 

This says that if A is [], the answer is B. Otherwise, the answer is the head of A cons'd to the result of appending B to the tail of A. Note that the "cons" operation is "done" in the rule head -- the pattern [H|C] matches a new symbolic expression derived from the value returned by the append on the rule RHS.

PROLOG defines some built-in functions for arithmetic. You can write ordinary expressions like 1+2+A*3 as predicate arguments. However, these are symbolic expressions, and are not "evaluated" to numbers. To force evaluation, the special predicate "is" must be used. "X is 1+2*3" is true, and sets X to 7, the value of the RHS. The RHS of "is" cannot contain variables that have not yet (in the Left-to Right order) been given values.

/* length(L, N) is true when the number of elements in list L equals N. */

length([], 0).

length([_|T], N):- length(T,M), N is M+1.

If I had tried to write the "is" before the "length()" in the RHS, PROLOG would object. If I had written length([_|T], 1+M) as the rule LHS, I would have gotten a symbolic expression for the length, not a number.

The "cut": The PROLOG approach to satisfying a goal query uses backtracking every time a RHS clause fails; sometimes this leads to ridiculous results. There is need for some way to tell PROLOG to "stop searching -- you have found the only way I want you to succeed in matching this goal". The special predicate "!" serves this purpose. "!" always succeeds ONCE. If PROLOG backtracks to "!", PROLOG immediately abandons further efforts to satisfy the parent predicate. That is, PROLOG refuses to try alternatives for RHS queries to the left of "!", and PROLOG also refuses to try any further rules for this predicate.

/* sum_to(N, R) is true when R is the sum of the integers 1, … , N. */

sum_to(1,1):- !.

sum_to(N,R) :- M is N-1, sum_to(M, R1), R is R1+N.

Changing the Data Base during execution: PROLOG allows use of the special predicate "asserta(pred(args))" during program execution. This predicate is always true, but it has the side-effect of putting the fact "pred(args)" into the data base, as the first rule for pred. This can be used to speed up recursive search, by changing a recursive program to a "memo function". Such a function remembers if it has been called before with the same arguments, and if so, just returns the result it got previously, without doing any further recursive search.

/* fibb(N, M) is true if M is the N-th Fibbonacci number. */

fibb(1,1).

fibb(2,1).

fibb(N,M):-N1 is N-1, N2 if N-2, fibb(N1, M1), fibb(N2, M2),

M is M1+M2, asserta(fibb(N,M).

This changes an exponential-time program for Fibbonacci number calculation to one that runs in linear time, because the matching of a query against a data base rule head can be done using a hash table, essentially in unit time.

Debugging:

PROLOG has sophisticated means for helping your analyze your program during execution. Evaluating the predicate "trace." (no arguments) turns on an execution mode in which PROLOG waits for user interaction before proceeding with each "execution event". Such an event takes place when a predicate starts to be evaluated (is called), or when one "succeeds" or "fails". (The even "back-to" is also monitored, but doesn't wait). When an event occurs, PROLOG prints the predicate, and the current value of its arguments, using notation like "_16384" for uninstantiated variables. The user can prompt with "h" to get a list of possible action prompts that can be typed. These include: RETURN -- to move into the predicate, and "s" to execute the predicate to its succeed or fail exit. One can also plant "spy points", by evaluating "spy <name>", where <name> is the name of some procedure in the data base. When a spy-point is reached, trace mode is turned on. During tracing, a prompt with "q" executes either to the end of the current predicate, or to the next spy point before prompting.

More examples:

(Not tested.)

comb(N, I, L) is true if and only if L is a list of all the length I subsets of the set {1, …, N}. I and N are instantiated as integers on entry.

Assume the data base contains the representation of a directed graph, as follows:

Each node i in the graph is accompanied by the facts: arcs(i, dl), where dl is a list of [ node, length] sublists. If [j, x] appears in dl, the graph has an arc of length dl from node i to node j. The problem is to find a path of least total distance (sum of its arc lengths) from node 0 to node N: path(N,L, D) is true when L is such a path (a list of nodes), and D is the total distance of path L.

A method that works well is due to Dijkstra: Label those nodes whose minimum distance from 0 is known with that distance. A new node not yet labeled can be labeled if that node is one whose direct distance from any labeled node is smallest. The currently-labeled nodes are maintained in a set H using facts that are added to the data base: inH(i) indicates that i is in H. Each node has a distance: d(i, distance) and these facts are updated during the computation. The graph's node set consists of all nodes numbered 0..N. A computation stage occurs as a new node is added to H. First, the arcs leaving that node are followed, and the tentative distance for arcs at each arc tip are adjusted. Then, the set of nodes not in H are searched for a node with smallest distance among these nodes. One node at minimum distance is selected and added to H, beginning the next stage. Once all nodes reachable from 0 have been added to H, the computation ends (it can end sooner, when N enters H). The list returned can be discovered from the d() facts in the data base, or it can be computed backwards from N, by a trail t(j, prev) of facts entered when node "prev" minimizes j's distance. During the computation, it may be helpful to maintain also the set of nodes that are reachable, and are not yet in H.

The reason modification of the data base is so important is that such modification is the only way to keep track of facts that are discovered during rules that eventually fail. The backtracking that occurs during "fail" processing discards all the variable instantiations that lead to the "fail". So you can't ask PROLOG to generate a list of all the nodes that have a d() entry (and are thus reachable) without modifying the data base. There is a built-in predicate "setof(X,P,S)" which is true if and only if S is the set of all instantiations of X that make P provable, where S is non-empty. P is a predicate that presumably mentions variable X. So setof(I, d(I,_), R) should succeed, with R equal to the set (list) of node numbers that have d() entries. The setof function can be defined, using asserta() and retract() to update R as entries are found. A "fail" predicate then causes P to find the next provable case.

setof(X, P, S):- asserta(r(last)), P, asserta(r(X)), fail.

setof(_,_,S):- collect([],M), !, L=M.

collect(S, L):- getnext(X), !, collect([X|S], L).

collect(L,L):- L \== [].

getnext(X):- retract(r(X)), !, X \== last.