Advance Usage¶
Command-Line Options¶
RA provides a number of command-line options. To see a complete list,
run RA with command-line flag -h
(for help). Some of the most
useful ones are listed below.
Use
-c
(--configfile
) config_file to specify a RA user configuration file. Among other things, the user configuration file is useful for telling RA how to connect to your own database server. See Configuration File for more details.Use
-v
(--verbose
) to have RA show more information. This flag is useful for debugging, as RA will show you a tree representation of the RA query you entered. Use-d
(--debug
) if you want to see even more information, e.g., the SQL translation of your RA query.Use
-i
(--inputfile
) input_file to specify a file of statements for RA to run. In this case, RA runs in non-interactive mode (you won’t get the RA prompt), and simply print out the output from executing the statements. Here, to get a more complete record of the execution, you can additionally use-e
(--echo
) to tell RA to echo all statements it reads to the output as well.Use
-o
(--outputfile
) output_file to specify a file to which RA will write its output (in addition to printing that out). Together with-e
,-o
will give you a complete record of your RA session, which is especially useful for recording what you did in an interactive session.The optional source argument (without a flag) specifies the data source you want to connect to. It can be the name of a section in your user configuration file (under which RA can find detailed connection information), or the name of the database you are connecting to (assuming default for other connection information). See Configuration File for details.
Data Types and Operators¶
RA knows about five basic value types: boolean
, number
,
string
, date
, and datetime
. It doesn’t distinguish
integers vs. floats, or fixed- vs. variable-length strings.
Note
RA also uses two special types unknown
and any
; see
Type Checking below for how they are useful.
Literals¶
As with SQL, string literals are enclosed in single quotes, e.g.,
'Amy'
. If you need to include the single-quote character itself
in a string literal, use two in a row, e.g., 'Alivia''s Durham
Bistro'
.
You can enter date
and datetime
literals as strings (e.g.,
'1975-01-01'
and '2017-06-01 21:00:00'
); RA relies on the
underlying database system to convert them implicitly into date
s
and datetime
s (see Type Checking below).
Operators and Functions¶
Comparison operators
<
,<=
,=
(equal),<>
(not equal),>=
, and>
work for pairs ofnumber
s,string
s,date
s, ordatetime
s, and return aboolean
.Arithmetic operators
+
,-
,*
, and/
work on a pair ofnumber
s, and return anumber
.Concatenation operator
||
works on twostring
s and returns astring
.Boolean operators
and
,or
, andnot
work onboolean
values and return aboolean
value.As in SQL, operator
like
matches astring
against a pattern (represented as astring
). For example,bar.name like 'A%'
returns true ifbar.name
starts with “A
” (%
in a pattern matches any sequence of 0 or more characters).
The list above is not exhaustive. For more details, please refer to
the declarations in the default_functions
setting in the
default configuration file (this file is
not meant to be modified by users); see Specification of Built-in Functions for the format of these declarations.
In addition to the operators above, RA also supports functions with
the standard syntax of func_name(
arg_1,
arg_2,
… )
. For example, RA on top of PostgreSQL understands
now()
(which returns the current datetime
) and ceil(1.618)
(which returns the ceiling of 1.618). Despite the SQL standard,
different database vendors support different SQL functions. For the
most part, RA simply translates these function calls verbatim to SQL.
However, you can declare a list of functions supported by a specific
database vendor in a configuration file (again, see
Specification of Built-in Functions for details) so RA can
provides additional type-checking (see Type Checking below).
Type Checking¶
RA assumes that an operator or function takes a number of values as input arguments, and returns a single value as output. Required arguments (possibly none) always come first, followed by optional ones (possibly none). Finally, a function can be declared to take an arbitrary number of additional arguments beyond those listed; these additional arguments all must have the same type as the last one listed.
Here are some examples (see Specification of Built-in Functions for exact syntax and where to specify them):
now() -> datetime
This function takes no input arguments and returns a single
datetime
value.
substr(string, number, number?) -> string
This function takes a string and a number as its required arguments, and another number as its optional third argument, as indicated by the
?
suffix; the function returns a string.
greatest(number, number, number*) -> number
This function takes two or more numbers as input arguments, and returns a number. Here,
*
indicates zero or more occurrences.
foo(date, string?, number?, any*) -> number
This toy example specifies a function with a required date input argument, optionally followed by a string, a number, and then any number of additional parameters with any type. Here,
any
is special type to which all data types conform to.
On a high level, RA type-checks each use of operator/function in a query against known declarations. Several rules are worth noting.
Implicit type conversions
RA assumes that the underlying database system can automatically convert
datetime
todate
values, and convertstring
todate
anddatetime
values. Therefore, RA is able to deduce thatdrinker.dob <= '1975-01-01'
(where attributedrinker.dob
has typedate
) is a valid comparison, because'1975-01-01'
can be converted to adate
. If thestring
value doesn’t translate to a validdate
, RA relies on the underlying database system to catch the error at run time.
Positional arguments
RA assumes that arguments are “positional.” In other words, for each argument type listed in a declaration, RA expects an argument of a conforming type at the same position. Consequently, if you want to supply a optional argument, then all arguments preceding it in the declaration (including optional arguments) must be supplied. For example,
foo('1975-01-01', 'Watergate', 1)
conforms to the declaration offoo
above, whilefoo('1975-01-01', 1)
does not.
Polymorphic operators/functions
In general, RA allows each operator/function to have multiple signatures. Based on the order in which they appear in configuration files (see Specification of Built-in Functions for details), these declarations are organized as a list. To check an invocation, RA walks down this list. As soon as a RA encounters a declaration that this invocation’s input arguments conform to, RA considers the invocation to be correct and assumes that its return type is the one associated with that declaration.
Warning
An implication of this overly simplistic rule is that the order of declaration matters, unfortunately, sometimes in subtle way. For example, in the sample configuration file for PostgreSQL, declarations for
+
are listed in the following order:PLUS(number, number) -> number PLUS(date, number) -> date # here number means the number of days PLUS(number, date) -> date # dittoSuppose, for the sake of argument, that an implicit conversion from
number
todate
is possible (it is not in RA). Then, had we instead listedPLUS(date, number)
beforePLUS(number, number)
, RA would infer that1+2
returns adate
!A perhaps better alternative for a future version of RA would be a picking the declaration that “best fits” the given invocation (i.e., requiring the least amount of implicit type conversion).
Unrecognized functions/SQL types
If RA encounters a function f that it doesn’t know about, it assumes that f can take any number of arguments of any type, and that f returns a value of special type
unknown
. Typeunknown
is also used when RA encounters an exotic SQL type that it doesn’t know (e.g., a user-defined type for some database column). RA assumes Typeunknown
can be implicitly converted to any other type. This rule is extremely lenient by design, because different database vendors support many, many different built-in functions that would be impossible for RA to track. This rule essentially allows RA to pass on such expressions to the underlying database system, which will eventually check them in SQL.
Writing Queries¶
Relation Schema and Attribute References¶
Note
Before delving into details, let’s start with a bit of motivation. The relational model in its purest form requires distinct attribute names within each relation (because a tuple is considered as a function from attribute names to values), including result relations returned by relational algebra expressions; therefore, each attribute can be referenced uniquely by its name. In practice, however, to simplify expressions, RA (as well as most textbooks and courses that cover relational algebra) allows duplicate attribute names and, in case of confusion, allows them to be distinguished by prefixing them with names of relations where they come from, e.g.:
Drinker \join_{Drinker.name=Frequents.drinker} Frequents;
\select_{Drinker.name=Frequents.drinker}
(Drinker \cross Frequents);
However, the notion of “relations where they come from” needs
clarification when inputs to an operator are themselves complex
queries. Moreover, it is possible for attributes to become
indistinguishable even if we qualify them with relation names,
e.g., in the output of Drinker \cross Drinker;
.
Furthermore, the projection operator in relational algebra is often extended to allow output attributes computed by expressions. It is not always obvious how to name such output attributes; as a result, it may become impossible to refer to refer to such an attribute by name.
Therefore, we need to clarify how relation schema and attribute references work in RA.
In RA, a relation can a stored database relation, or the result of executing a relational algebra query (or subquery). The schema of the relation is a list of attribute specifications, while each attribute specification is a triple (rel_name, attr_name, attr_type). Intuitively, rel_name is the name of the relation where this attribute originally comes from. Both rel_name and attr_name can be optional. Within a relation, there is no requirement that attr_names, or even (rel_name, attr_name) pairs, are unique.
Some relational operators allow you to write expressions involving references to attributes in input relations. An attribute reference must be in one of the following forms:
attr_name (just the attribute name by itself);
rel_name
.
attr_name (attribute name prefixed by relational name).
For an attribute reference in an operator to be valid, there must be exactly one attribute specification in the input relation schema(s) matching the reference. If there are more than one matching attribute specifications (possibly from different input relation schemas), the attribute reference is ambiguous.
For a stored database relation, RA uses its column names and types for the attribute specifications (mapping SQL data types to RA ones as appropriate; see Data Types and Operators); the rel_names get the name of the database table.
The section on Relational Algebra Operators will further spell out how each relational operator generates its output relation schema from the input relation schema(s). In general, attribute names may no longer stay unique, and you may lose the ability to distinguish or reference certain attributes.
If you turn on the verbose option (-v
; see Command-Line Options) when running RA, RA will print out an operator tree for your
query and show the schema for each result relation (intermediate or
final).
Relational Algebra Operators¶
This section will not define what these operators do on data—see Basic Usage or refer to standard textbooks. Here, we instead focus on issues more specific to RA.
Selection: \select_{
condition}
input_relation
Here, condition must have type boolean. The output relation schema is the same as the input one.
Projection: \project_{
attr_list}
input_relation
Here, attr_list is a comma-separated list of expressions that specifies the output attributes. The output relational schema is the list of output attribute specifications, ordered according to attr_list. For an output attribute, if its expression is simply an attribute reference, its specification will be the same as the specification of the input attribute being referenced. Otherwise, the output attribute specification will have no relation name or attribute name, and its type will be what RA infers for the expression.
Theta-Join: input_relation_1 \join_{
cond}
input_relation_2
Here, condition must have type boolean. The output relation schema is the concatenation of the two input relation schemas. RA will warn if some attribute from the first input can be potentially confused with some attribute from the second input, which may create a problem when you want to refer to them later in the join output. (Here, RA only warns about ambiguity caused by the join; if either input already contains ambiguously named attributes by itself, they would have been caught and reported earlier.)
Natural join: input_relation_1 \join
input_relation_2
RA looks for matches of pairs of attributes—one from each input—with identical (and non-optional) names. RA will generate an error if an attribute is involved in more than one match, or the two attributes involved in a match have types that cannot be equated. If no matches are found, natural join degenerates into cross product, and RA will generate a warning.
For each pair of matching attributes, RA equates them in the join. The output relation schema consists of all attribute specifications from input_relation_1, followed by those from input_relation_2 that are not involved in any match. Consequently, you can still refer to a join attribute by name in the output relation, but prefixing it with the name of input_relation_2 won’t work any more (incidentally, prefixing it with the name of input_relation_1 would still work, but it is not recommended).
Cross product: input_relation_1 \cross
input_relation_2
The output relation schema is the concatenation of the two input relation schemas. Again, as with the case of theta-join, RA will warn if some attribute from the first input can be potentially confused with some attribute from the second input.
Set union, difference, and intersection:
input_relation_1
\union
input_relation_2input_relation_1
\diff
input_relation_2input_relation_1
\intersect
input_relation_2Two input relations must have the same number of attributes and every pair of corresponding attributes must have identical types (which is a stronger condition than being able to equate them). RA will warn if some pair of corresponding attributes have different names.
The output relation schema consists of all attribute specifications from input_relation_1. Consequently, you can still refer to attributes by their names from the first input relation, but you lose the ability of referring them by names from the second input relation. (Allowing the later would necessitate remembering multiple possible names for each output attribute, which can get quite confusing; for this reason, RA adopts the convention of always going with the first input relation.)
Rename:
\rename_{
new_attr_names}
input_relation
\rename_{
new_rel_name: *}
input_relation
\rename_{
new_rel_name:
new_attr_names}
input_relationThe output relation schema consists of all attribute specifications from input_relation. If new_rel_name is given, the relation name for all attribute specifications will set to new_rel_name; otherwise the relation name will be unset (the relation names from the input schema will be forgotten). If new_attr_names is given, it will be used to set the attribute names in the output attribute specifications; otherwise the attribute names from the input relation will be retained.
Aggregation and grouping:
\aggr_{
aggr_attr_list}
input_relation
\aggr_{
group_by_attrs:
aggr_attr_list}
input_relationHere, group_by_attrs is a comma-separated list of expressions used for grouping tuples in input_relation. Most commonly, they are simply attributes from input_relation. All tuples that agree on the values of group_by_attrs go into the same group. In the form without group_by_attrs, all tuples in input_relation will be in one single group.
aggr_attr_list is a comma-separated list of expressions, each of which must evaluate unambiguously to one value for each group. If an expression in aggr_attr_list references a non-group-by attribute, that attribute must (eventually) be input to an aggregate function, which will compute a single value over all tuples in each group.
The output attributes include group_by_attrs followed by aggr_attr_list, in the order of specification. For an output attribute, if its expression is simply an attribute reference, its specification will be the same as the specification of the input attribute being referenced. Otherwise, the output attribute specification will have no relation name or attribute name, and its type will be what RA infers for the expression.
Views¶
RA lets you define “views,” which may be thought of as temporary, relation-valued variables holding the results of relational algebra expressions. To define a view, use the syntax:
view_name :-
view_definition_query;
Once you define a view, it will show up in \list
, and you can use
it later in queries (including queries that define other views) as if
it’s a database relation. Whenever RA evaluates a query, it basically
“expands” any view reference in the query by replacing the view
reference with the corresponding view definition query. If that
definition query itself contains other view references, RA will expand
them recursively. In the end, RA always evaluate a query starting
from database relations. The output relation schema from a view
reference is the output relation schema for the corresponding view
definition query, but with all relation names in all attribute
specifications set to the view name.
To undefine a view (i.e., to forget its definition), use the command
\clear
view_name;
. RA won’t let you undefine a view if it
is used to define other views. However, you can add !
after
\clear
to undefine view_name as well as every view dependent on
it (directly or indirectly) “by force.” You can also use \clear *
to undefine all views.
RA allows you to redefine a view. It will even let you redefine a view v when there are other views defined using v, provided that all (directly or indirectly) dependent view definitions still make sense with the redefined v, and that this redefinition doesn’t lead to any circular definition.
Finally, you can save view definitions to files and load them back
into an RA session. The command \save
view_name '
file_name';
saves the definition of the given view as well as
those of all prerequisite views in the specified file; note that
file_name need be surrounded in single quotes. If you do not
specify the file name, the default file name will be view_name.ra
. You can also use *
in place of view_name; in this case
RA will save all your views in a file whose name defaults to
views.ra
. RA will not let you overwrite an existing file, unless
you add !
after \save
.
Warning
Once you exit from RA, you lose all views you have defined in the session. So make sure you save what you want before exiting!
Other Features¶
This section describes other useful RA features.
Comments:
RA supports C/C++/Java-style comments.
//
starts a single-line comment; text following//
is considered comment and will be ignored./*
and*/
start and end (respectively) a possibly multi-line comment; text between them is considered comment and will be ignored.
Command-line history editing:
On systems that support the GNU Readline, RA provides command-line input history and editing using arrow keys. For example, Up/Down recall previous/next lines, and Left/Right move within the current line.
Executing a script: \source '
ra_file';
This command makes RA read statements from the specified file and execute them. Note that ra_file must be enclosed in single quotes. The file should be just a simple text file containing RA statements and comments. This file can be prepared manually with a text editor, or it can be the result of a
\save
command.
Executing SQL: \sqlexec_{
sql_statement };
With this command, you can send a SQL statement to the underlying database. RA (and relational algebra) doesn’t have its own language constructs for data definition (such as
CREATE TABLE
in SQL) or data modification (such asINSERT
,DELETE
, andUPDATE
in SQL), this feature conveniently allows you to do all that without leaving RA. Note that sql_statement should NOT be terminated by;
(instead,;
should terminate the RA command, following}
). Some examples of using\sqlexec
can be found in the.ra
file for creating the sample database.
Configuration File¶
RA relies on a system configuration file,
which you should not modify normally. Then, RA looks for a user
configuration file named .radb.ini
in your home directory (the
definition of “home directory” varies across operating systems).
Alternatively, you can specify a user configuration file using the
-c
option (see Command-Line Options).
The most important use of your user configuration file is to tell RA how to connect to your own database server. Some examples are given in files with .ini suffix in the sample directory, e.g., postgresql.ini. You can copy one of these files to the default user configuration file location and modify it to your liking. See comments in these files for instructions on how to set appropriate values for your database system.
The user configuration file is divided into sections. The first
section should be named [DEFAULT]
. Subsequent sections can
provide additional customization that overrides the default. For
example, with the sample postgresql.ini, you will by default connect to PostgreSQL
server running locally on your computer, and you will need to give the
name of the database to connect to when you run RA (see
Command-Line Options).
Note
RA only installs the SQLite driver by default, so if you
want to use it for other database systems, you will first need to
install the appropriate drivers yourself. For example, to install
the PostgreSQL driver, simple use the command pip install
psycopg2
on your system (or pip3
if Python 3 is not the
default Python version on your system).
With the configuration file, you can do more. Suppose you set
db.database
under [DEFAULT]
, and add two more sections as
follows:
[DEFAULT]
# ... other settings remain the same...
db.database=beers
# ... other settings remain the same...
[play]
db.database=pokemongo
[production]
db.username=me
db.password=buzzword
db.host=my.server.com
db.port=5432
db.database=mydb
Then, running RA without a source argument will connect you to the
beers
database on the local server by default; running RA with
play
as the source argument will connect you to the local
pokemongo
database (in this case, using pokemongo
as source
would achieve the same effect); finally, running RA with
production
as source will connect you to the mydb
database
running on the remote server, using the credentials supplied under the
[production]
section.
Another use of the configuration file to declare to RA useful built-in functions support by your database system. This use is discussed further below.
Specification of Built-in Functions¶
Specifying a built-in function in the configuration file makes RA
aware of its signature and allows RA to check its input and out output
types (see also Type Checking), which helps catch errors in
queries. In fact, RA specifies all operators and functions that are
known to work across database vendors in its system
configuration file under the setting for
default_functions
. Additionally, RA looks for a setting for
functions
in the user configuration file, which can supplement or
override specifications in default_functions
.
In both settings, each function is declared with a single line of text, with the following format:
fname (
req_t1,
req_t2,
… ,
opt_t1?,
opt_t2?,
… ,
opt_tlast*) ->
return_t
Here, fname is the name of the function; upper-case function names
in the system configuration file are
reserved for special operators. The (possibly empty) sequence
req_t1, req_t2, … specifies the types of required arguments.
The (possibly empty) sequence opt_t1, opt_t2, …, each suffixed
by ?
, specifies the types of optional arguments. The optional
opt_tlast suffixed by *
specifies that the function can take an
arbitrary number of additional arguments of the given type at the end.
Finally, return_t specifies the result type of the function. These
types can be any of the basic value types supported by RA, as well as
any
(see Data Types and Operators).
You can also specify a function as an aggregate, by prefixing the
declaration line with keyword aggregate:
(before fname). An
aggregate function takes a (multi)set of input tuples, each with
attribute(s) matching the declared argument specification, and
computes a single output value of the declared return type. These
functions can be used in the aggregation operator \aggr
(see
Basic Usage).
Comments can be added with #
; text following #
will be
ignored. There cannot be empty lines between function declarations.
The order of declarations is important; see Type Checking for
details.
Warning
Recall from discussion in Type Checking that in general, a function can have multiple signatures. Suppose your user configuration file gives (at least) one declaration for an operator/function f already declared in the system configuration file. In this case, RA assumes that you intend to wipe out all system default declarations for f, and would like to use only your own declarations for f. (This behavior is necessitated by RA’s simple, order-based resolution discussed in Type Checking, because we need a way for user configuration to customize the order of declarations.) An implication of this rule is that you must remember to declare all alternative signatures for f in your user configuration file.