Introduction to SQL from Stefan. Not yet marked up, but will go into

author Thomas G. Lockhart

Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)

committer Thomas G. Lockhart

Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)
author Thomas G. Lockhart
Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)
committer Thomas G. Lockhart
Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml

new file mode 100644 (file)

index 0000000..08481bf
--- /dev/null
+++ b/doc/src/sgml/sql.sgml
@@ -0,0 +1,1126 @@
+ 
+  SQL
+
+  
+   
+    This chapter originally appeared as a part of 
+    Stefan Simkovics' Master's Thesis.
+
+
+    
+  
+
+  
+   SQL has become one of the most popular relational query languages all
+   over the world. 
+   The name "SQL" is an abbreviation for
+   Structured Query Language. 
+   In 1974 Donald Chamberlin and others defined the
+   language SEQUEL (Structured English Query Language) at IBM
+   Research. This language was first implemented in an IBM
+   prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
+   of SEQUEL called SEQUEL/2 was defined and the name was changed to SQL
+   subsequently.
+  
+
+  
+A new prototype called System R was developed by IBM in 1977. System R
+implemented a large subset of SEQUEL/2 (now SQL) and a number of
+changes were made to SQL during the project. System R was installed in
+a number of user sites, both internal IBM sites and also some selected
+customer sites. Thanks to the success and acceptance of System R at
+those user sites IBM started to develop commercial products that
+implemented the SQL language based on the System R technology.
+  
+
+  
+Over the next years IBM and also a number of other vendors announced
+SQL products such as SQL/DS (IBM), DB2 (IBM) ORACLE (Oracle Corp.)
+DG/SQL (Data General Corp.)  SYBASE (Sybase Inc.).
+  
+
+  
+SQL is also an official standard now. In 1982 the American National
+Standards Institute (ANSI) chartered its Database Committee X3H2 to
+develop a proposal for a standard relational language. This proposal
+was ratified in 1986 and consisted essentially of the IBM dialect of
+SQL. In 1987 this ANSI standard was also accepted as an international
+standard by the International Organization for Standardization
+(ISO). This original standard version of SQL is often referred to,
+informally, as "SQL/86". In 1989 the original standard was extended
+and this new standard is often, again informally, referred to as
+"SQL/89". Also in 1989, a related standard called {\it Database
+Language Embedded SQL} was developed.
+  
+
+  
+   The ISO and ANSI committees have been working for many years on the
+   definition of a greatly expanded version of the original standard,
+   referred to informally as "SQL2" or "SQL/92". This version became a
+   ratified standard - "International Standard \mbox{ISO/IEC 9075:1992}, {\it
+   Database Language SQL}" - in late 1992. "SQL/92" is the version
+   normally meant when people refer to "the SQL standard". A detailed
+   description of "SQL/92" is given in \cite{date}. At the time of
+   writing this document a new standard informally referred to as "SQL3"
+   is under development. It is planned to make SQL a turing-complete
+   language, i.e.\ all computable queries (e.g. recursive queries) will be
+   possible. This is a very complex task and therefore the completion of
+   the new standard can not be expected before 1999.
+  
+
+  
+   The Relational Data Model}
+
+  
+    As mentioned before, SQL is a relational language. That means it is
+    based on the "relational data model" first published by E.F. Codd in
+    1970. We will give a formal description of the relational model in
+    section 
+    
+    but first we want to have a look at it from a more intuitive
+    point of view.
+  
+
+  
+    A {\it relational database} is a database that is perceived by its
+    users as a {\it collection of tables} (and nothing else but tables).
+    A table consists of rows and columns where each row represents a
+    record and each column represents an attribute of the records
+    contained in the table. Figure \ref{supplier} shows an example of a
+    database consisting of three tables:
+\begin{itemize}
+\item SUPPLIER is a table storing the number
+(SNO), the name (SNAME) and the city (CITY) of a supplier.
+\item PART is a table storing the number (PNO) the name (PNAME) and
+the price (PRICE) of a part.
+\item SELLS stores information about which part (PNO) is sold by which
+supplier (SNO). It serves in a sense to connect the other two tables
+together.
+\end{itemize}
+%
+\begin{figure}[h]
+\begin{verbatim}
+   SUPPLIER   SNO |  SNAME  |  CITY      SELLS   SNO | PNO
+             -----+---------+--------           -----+-----
+               1  |  Smith  | London              1  |  1
+               2  |  Jones  | Paris               1  |  2
+               3  |  Adams  | Vienna              2  |  4
+               4  |  Blake  | Rome                3  |  1
+                                                  3  |  3
+                                                  4  |  2
+   PART       PNO |  PNAME  |  PRICE              4  |  3 
+             -----+---------+---------            4  |  4
+               1  |  Screw  |   10
+               2  |  Nut    |    8
+               3  |  Bolt   |   15
+               4  |  Cam    |   25
+\end{verbatim}
+\caption{The suppliers and parts database}
+\label{supplier}
+\end{figure}
+%
+The tables PART and SUPPLIER may be regarded as {\it entities} and
+SELLS may be regarded as a {\it relationship} between a particular
+part and a particular supplier. 
+
+As we will see later, SQL operates on tables like the ones just
+defined but before that we will study the theory of the relational
+model.
+
+\subsection{Formal Notion of the Relational Data Model}
+\label{formal_notion}
+The mathematical concept underlying the relational model is the
+set-theoretic {\it relation} which is a subset of the Cartesian
+product of a list of domains. This set-theoretic {\it relation} gives
+the model its name (do not confuse it with the relationship from the {\it
+Entity-Relationship model}). Formally a domain is simply a set of
+values. For example the set of integers is a domain. Also the set of
+character strings of length 20 and the real numbers are examples of
+domains.
+\begin{definition}
+The {\it Cartesian} product of domains $D_{1}, D_{2},\ldots, D_{k}$ written
+\mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
+all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
+D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.  
+\end{definition}
+For example, when we have $k=2$, $D_{1}=\{0,1\}$ and
+$D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
+$\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
+%
+\begin{definition}
+A Relation is any subset of the Cartesian product of one or more
+domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
+\end{definition}
+%
+For example $\{(0,a),(0,b),(1,a)\}$ is a relation, it is in fact a
+subset of $D_{1} \times D_{2}$ mentioned above.
+The members of a relation are called tuples. Each relation of some
+Cartesian product \mbox{$D_{1} \times D_{2} \times \ldots \times
+D_{k}$} is said to have arity $k$ and is therefore a set of $k$-tuples.
+
+A relation can be viewed as a table (as we already did, remember
+figure \ref{supplier} {\it The suppliers and parts database}) where
+every tuple is represented by a row and every column corresponds to
+one component of a tuple. Giving names (called attributes) to the
+columns leads to the definition of a {\it relation scheme}.
+%
+\begin{definition}
+A {\it relation scheme} $R$ is a finite set of attributes
+\mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
+each attribute $A_{i}, 1 \le i \le k$ where the values of the
+attributes are taken from. We often write a relation scheme as
+\mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
+\end{definition}
+{\bf Note:} A {\it relation scheme} is just a kind of template
+whereas a {\it relation} is an instance of a {\it relation
+scheme}. The {\it relation} consists of tuples (and can therefore be
+viewed as a table) not so the {\it relation scheme}.
+
+\subsubsection{Domains vs. Data Types}
+\label{domains}
+We often talked about {\it domains} in the last section. Recall that a
+domain is, formally, just a set of values (e.g., the set of integers or
+the real numbers). In terms of database systems we often talk of {\it
+data types} instead of domains. When we define a table we have to make
+a decision about which attributes to include. Additionally we
+have to decide which kind of data is going to be stored as
+attribute values. For example the values of SNAME from the table
+SUPPLIER will be character strings, whereas SNO will store
+integers. We define this by assigning a {\it data type} to each
+attribute. The type of SNAME will be VARCHAR(20) (this is the SQL type
+for character strings of length $\le$ 20), the type of SNO will be
+INTEGER. With the assignment of a {\it data type} we also have selected
+a domain for an attribute. The domain of SNAME is the set of all
+character strings of length $\le$ 20, the domain of SNO is the set of
+all integer numbers.
+
+\section{Operations in the Relational Data Model}
+\label{operations}
+In section \ref{formal_notion} we defined the mathematical notion of
+the relational model. Now we know how the data can be stored using a
+relational data model but we do not know what to do with all these
+tables to retrieve something from the database yet. For example somebody
+could ask for the names of all suppliers that sell the part
+'Screw'. Therefore two rather different kinds of notations for
+expressing operations on relations have been defined:
+%
+\begin{itemize}
+\item The {\it Relational Algebra} which is an algebraic notation,
+where queries are expressed by applying specialized operators to the
+relations.
+\item The {\it Relational Calculus} which is a logical notation,
+where queries are expressed by formulating some logical restrictions
+that the tuples in the answer must satisfy.
+\end{itemize}
+%
+\subsection{Relational Algebra}
+\label{rel_alg}
+The {\it Relational Algebra} was introduced by E.~F.~Codd in 1972. It
+consists of a set of operations on relations:
+\begin{itemize}
+\item SELECT ($\sigma$): extracts {\it tuples} from a relation that
+satisfy a given restriction. Let $R$ be a table that contains an attribute
+$A$. $\sigma_{A=a}(R) = \{t \in R \mid t(A) = a\}$ where $t$ denotes a
+tuple of $R$ and $t(A)$ denotes the value of attribute $A$ of tuple $t$.
+\item PROJECT ($\pi$): extracts specified {\it attributes} (columns) from a
+relation. Let $R$ be a relation that contains an attribute $X$. $\pi_{X}(R) =
+\{t(X) \mid t \in R\}$, where $t(X)$ denotes the value of attribute $X$ of
+tuple $t$.
+\item PRODUCT ($\times$): builds the Cartesian product of two
+relations. Let $R$ be a table with arity $k_{1}$ and let $S$ be a table with
+arity $k_{2}$. $R\times S$ is the set of all $(k_{1}+k_{2})$-tuples
+whose first $k_{1}$ components form a tuple in $R$ and whose last
+$k_{2}$ components form a tuple in $S$.
+\item UNION ($\cup$): builds the set-theoretic union of two
+tables. Given the tables $R$ and $S$ (both must have the same arity),
+the union $R \cup S$ is the set of tuples that are in $R$ or $S$ or
+both.
+\item INTERSECT ($\cap$): builds the set-theoretic intersection of two
+tables. Given the tables $R$ and $S$, $R \cup S$ is the set of tuples
+that are in $R$ and in $S$. We again require that $R$ and $S$ have the
+same arity.
+\item DIFFERENCE ($-$ or $\setminus$): builds the set difference of
+two tables. Let $R$ and $S$ again be two tables with the same
+arity. $R-S$ is the set of tuples in $R$ but not in $S$.
+\item JOIN ($\Join$): connects two tables by their common
+attributes. Let $R$ be a table with the attributes $A,B$ and $C$ and
+let $S$ a table with the attributes $C,D$ and $E$. There is one
+attribute common to both relations, the attribute $C$. $R \Join S =
+\pi_{R.A,R.B,R.C,S.D,S.E}(\sigma_{R.C=S.C}(R \times S))$. What are we
+doing here? We first calculate the Cartesian product $R \times
+S$. Then we select those tuples whose values for the common
+attribute $C$ are equal ($\sigma_{R.C = S.C}$). Now we got a table
+that contains the attribute $C$ two times and we correct this by
+projecting out the duplicate column.
+\begin{example} 
+\label{join_example}
+Let's have a look at the tables that are produced by evaluating the steps
+necessary for a join. \\
+Let the following two tables be given:
+\begin{verbatim}
+         R   A | B | C      S   C | D | E       
+            ---+---+---        ---+---+---
+             1 | 2 | 3          3 | a | b       
+             4 | 5 | 6          6 | c | d              
+             7 | 8 | 9                                       
+\end{verbatim}
+First we calculate the Cartesian product $R \times S$ and get:
+\begin{verbatim}
+       R x S   A | B | R.C | S.C | D | E
+              ---+---+-----+-----+---+---
+               1 | 2 |  3  |  3  | a | b
+               1 | 2 |  3  |  6  | c | d
+               4 | 5 |  6  |  3  | a | b
+               4 | 5 |  6  |  6  | c | d
+               7 | 8 |  9  |  3  | a | b
+               7 | 8 |  9  |  6  | c | d
+\end{verbatim}
+\pagebreak
+After the selection $\sigma_{R.C=S.C}(R \times S)$ we get:
+\begin{verbatim}
+               A | B | R.C | S.C | D | E
+              ---+---+-----+-----+---+---
+               1 | 2 |  3  |  3  | a | b
+               4 | 5 |  6  |  6  | c | d
+\end{verbatim}
+To remove the duplicate column $S.C$ we project it out by the
+following operation: $\pi_{R.A,R.B,R.C,S.D,S.E}(\sigma_{R.C=S.C}(R
+\times S))$ and get:
+\begin{verbatim}
+                   A | B | C | D | E
+                  ---+---+---+---+---
+                   1 | 2 | 3 | a | b
+                   4 | 5 | 6 | c | d
+\end{verbatim}
+\end{example}
+\item DIVIDE ($\div$): Let $R$ be a table with the attributes $A,B,C$
+and $D$ and let $S$ be a table with the attributes $C$ and $D$. Then
+we define the division as: $R \div S = \{t \mid \forall t_{s} \in S~
+\exists t_{r} \in R$ such that
+$t_{r}(A,B)=t~\wedge~t_{r}(C,D)=t_{s}\}$ where $t_{r}(x,y)$ denotes a
+tuple of table $R$ that consists only of the components $x$ and
+$y$. Note that the tuple $t$ only consists of the components $A$ and
+$B$ of relation $R$.
+\begin{example}
+Given the following tables
+\begin{verbatim}
+          R   A | B | C | D        S   C | D
+             ---+---+---+---          ---+---
+              a | b | c | d            c | d
+              a | b | e | f            e | f
+              b | c | e | f
+              e | d | c | d
+              e | d | e | f
+              a | b | d | e
+\end{verbatim}
+$R \div S$ is derived as
+\begin{verbatim}
+                         A | B
+                        ---+---
+                         a | b
+                         e | d
+\end{verbatim}
+\end{example}
+\end{itemize}
+%
+For a more detailed description and definition of the relational
+algebra refer to \cite{ullman} or \cite{date86}.
+
+\begin{example}
+\label{suppl_rel_alg}
+Recall that we formulated all those relational operators to be able to
+retrieve data from the database. Let's return to our example of
+section \ref{operations} where someone wanted to know the names of all
+suppliers that sell the part 'Screw'. This question can be answered
+using relational algebra by the following operation:
+\begin{displaymath}
+\pi_{SUPPLIER.SNAME}(\sigma_{PART.PNAME='Screw'}(SUPPLIER \Join SELLS
+\Join PART))
+\end{displaymath}
+We call such an operation a query. If we evaluate the above query
+against the tables form figure \ref{supplier} {\it The suppliers and
+parts database} we will obtain the following result:
+\begin{verbatim}
+                             SNAME
+                            -------
+                             Smith
+                             Adams
+\end{verbatim}
+\end{example}
+\subsection{Relational Calculus}
+\label{rel_calc}
+The relational calculus is based on the {first order logic}. There are
+two variants of the relational calculus:
+%
+\begin{itemize} 
+\item The {\it Domain Relational Calculus} (DRC), where variables
+stand for components (attributes) of the tuples.
+\item The {\it Tuple Relational Calculus} (TRC), where variables stand
+for tuples.
+\end{itemize}
+%
+We want to discuss the tuple relational calculus only because it is
+the one underlying the most relational languages. For a detailed
+discussion on DRC (and also TRC) see \cite{date86} or \cite{ullman}.
+
+\subsubsection{Tuple Relational Calculus}
+The queries used in TRC are of the following form:
+\begin{displaymath}
+\{x(A) \mid F(x)\}
+\end{displaymath}
+where $x$ is a tuple variable $A$ is a set of attributes and $F$ is a
+formula. The resulting relation consists of all tuples $t(A)$ that satisfy
+$F(t)$.
+\begin{example}
+If we want to answer the question from example \ref{suppl_rel_alg}
+using TRC we formulate the following query:
+\begin{displaymath}
+\begin{array}{lcll}
+\{x(SNAME) & \mid & x \in SUPPLIER~\wedge & \nonumber\\ 
+&  & \exists y \in SELLS\ \exists z \in PART & (y(SNO)=x(SNO)~\wedge \nonumber\\
+&  &  &~ z(PNO)=y(PNO)~\wedge \nonumber\\
+&  &  &~ z(PNAME)='Screw')\} \nonumber
+\end{array}
+\end{displaymath}
+Evaluating the query against the tables from figure \ref{supplier}
+{\it The suppliers and parts database} again leads to the same result
+as in example \ref{suppl_rel_alg}.
+\end{example}
+
+\subsection{Relational Algebra vs. Relational Calculus}
+\label{alg_vs_calc}
+The relational algebra and the relational calculus have the same {\it
+expressive power} i.e.\ all queries that can be formulated using
+relational algebra can also be formulated using the relational
+calculus and vice versa. This was first proved by E.~F.~Codd in
+1972. This proof is based on an algorithm -"Codd's reduction
+algorithm"- by which an arbitrary expression of the relational
+calculus can be reduced to a semantically equivalent expression of
+relational algebra. For a more detailed discussion on that refer to
+\cite{date86} and
+\cite{ullman}. 
+
+It is sometimes said that languages based on the relational calculus
+are "higher level" or "more declarative" than languages based on
+relational algebra because the algebra (partially) specifies the order
+of operations while the calculus leaves it to a compiler or
+interpreter to determine the most efficient order of evaluation.
+
+
+\section{The SQL Language}
+\label{sqllanguage}
+%
+As most modern relational languages SQL is based on the tuple
+relational calculus. As a result every query that can be formulated
+using the tuple relational calculus (or equivalently, relational
+algebra) can also be formulated using SQL. There are, however,
+capabilities beyond the scope of relational algebra or calculus. Here
+is a list of some additional features provided by SQL that are not
+part of relational algebra or calculus:
+\pagebreak
+%
+\begin{itemize}
+\item Commands for insertion, deletion or modification of data.
+\item Arithmetic capability: In SQL it is possible to involve
+arithmetic operations as well as comparisons, e.g. $A < B + 3$. Note
+that $+$ or other arithmetic operators appear neither in relational
+algebra nor in relational calculus.
+\item Assignment and Print Commands: It is possible to print a
+relation constructed by a query and to assign a computed relation to a
+relation name.
+\item Aggregate Functions: Operations such as {\it average}, {\it
+sum}, {\it max}, \ldots can be applied to columns of a relation to
+obtain a single quantity.
+\end{itemize}
+%
+\subsection{Select}
+\label{select}
+The most often used command in SQL is the SELECT statement that is
+used to retrieve data. The syntax is:
+\begin{verbatim}
+   SELECT [ALL|DISTINCT] 
+          { * |  [AS ] [, ... 
+                [,  [AS ]]]}
+   FROM  [t_alias_1] 
+        [, ... [,  [t_alias_n]]]
+   [WHERE condition]
+   [GROUP BY  
+             [,... [, ]] [HAVING condition]]
+   [{UNION | INTERSECT | EXCEPT} SELECT ...]
+   [ORDER BY  [ASC|DESC] 
+             [, ... [,  [ASC|DESC]]]];
+\end{verbatim}
+Now we will illustrate the complex syntax of the SELECT statement
+with various examples. The tables used for the examples are defined in
+figure \ref{supplier} {\it The suppliers and parts database}.
+%
+\subsubsection{Simple Selects}
+\begin{example}
+Here are some simple examples using a SELECT statement: \\
+\\
+To retrieve all tuples from table PART where the attribute PRICE is
+greater than 10 we formulate the following query
+\begin{verbatim}
+   SELECT * 
+   FROM PART
+   WHERE PRICE > 10;
+\end{verbatim}
+and get the table:
+\begin{verbatim}
+                   PNO |  PNAME  |  PRICE
+                  -----+---------+--------
+                    3  |  Bolt   |   15
+                    4  |  Cam    |   25
+\end{verbatim}
+%
+Using "$*$" in the SELECT statement will deliver all attributes from
+the table. If we want to retrieve only the attributes PNAME and PRICE
+from table PART we use the statement:
+\begin{verbatim}
+   SELECT PNAME, PRICE 
+   FROM PART
+   WHERE PRICE > 10;
+\end{verbatim}
+\pagebreak
+\noindent In this case the result is:
+\begin{verbatim}
+                      PNAME  |  PRICE
+                     --------+--------
+                      Bolt   |   15
+                      Cam    |   25
+\end{verbatim}
+Note that the SQL SELECT corresponds to the "projection" in relational
+algebra not to the "selection" (see section \ref{rel_alg} {\it
+Relational Algebra}).
+\\ \\
+The qualifications in the WHERE clause can also be logically connected
+using the keywords OR, AND and NOT:
+\begin{verbatim}
+   SELECT PNAME, PRICE 
+   FROM PART
+   WHERE PNAME = 'Bolt' AND
+         (PRICE = 0 OR PRICE < 15);
+\end{verbatim}
+will lead to the result:
+\begin{verbatim}
+                      PNAME  |  PRICE
+                     --------+--------
+                      Bolt   |   15
+\end{verbatim}
+Arithmetic operations may be used in the {\it selectlist} and in the WHERE
+clause. For example if we want to know how much it would cost if we
+take two pieces of a part we could use the following query:
+\begin{verbatim}
+   SELECT PNAME, PRICE * 2 AS DOUBLE
+   FROM PART
+   WHERE PRICE * 2 < 50;
+\end{verbatim}
+and we get:
+\begin{verbatim}
+                      PNAME  |  DOUBLE
+                     --------+---------
+                      Screw  |    20
+                      Nut    |    16
+                      Bolt   |    30
+\end{verbatim}
+Note that the word DOUBLE after the keyword AS is the new title of the
+second column. This technique can be used for every element of the
+{\it selectlist} to assign a new title to the resulting column. This new title
+is often referred to as alias. The alias cannot be used throughout the
+rest of the query. 
+\end{example}
+
+\subsubsection{Joins}
+\begin{example} The following example shows how {\it joins} are
+realized in SQL: \\ \\
+To join the three tables SUPPLIER, PART and SELLS over their common
+attributes we formulate the following statement:
+\begin{verbatim}
+   SELECT S.SNAME, P.PNAME
+   FROM SUPPLIER S, PART P, SELLS SE
+   WHERE S.SNO = SE.SNO AND
+         P.PNO = SE.PNO;
+\end{verbatim}
+\pagebreak
+\noindent and get the following table as a result:
+\begin{verbatim}
+                       SNAME | PNAME
+                      -------+-------
+                       Smith | Screw
+                       Smith | Nut
+                       Jones | Cam
+                       Adams | Screw
+                       Adams | Bolt
+                       Blake | Nut
+                       Blake | Bolt
+                       Blake | Cam
+\end{verbatim}
+In the FROM clause we introduced an alias name for every relation
+because there are common named attributes (SNO and PNO) among the
+relations. Now we can distinguish between the common named attributes
+by simply prefixing the attribute name with the alias name followed by
+a dot. The join is calculated in the same way as shown in example
+\ref{join_example}. First the Cartesian product $SUPPLIER\times PART
+\times SELLS$ is derived. Now only those tuples satisfying the
+conditions given in the WHERE clause are selected (i.e.\ the common
+named attributes have to be equal). Finally we project out all
+columns but S.SNAME and P.PNAME. 
+\end{example}
+%
+\subsubsection{Aggregate Operators}
+SQL provides aggregate operators (e.g. AVG, COUNT, SUM, MIN, MAX) that
+take the name of an attribute as an argument. The value of the
+aggregate operator is calculated over all values of the specified
+attribute (column) of the whole table. If groups are specified in the
+query the calculation is done only over the values of a group (see next
+section).
+
+\begin{example}
+If we want to know the average cost of all parts in table PART we use
+the following query:
+\begin{verbatim}
+   SELECT AVG(PRICE) AS AVG_PRICE
+   FROM PART;
+\end{verbatim}
+The result is:
+\begin{verbatim}
+                         AVG_PRICE
+                        -----------
+                           14.5
+\end{verbatim}
+If we want to know how many parts are stored in table PART we use
+the statement:
+\begin{verbatim}
+   SELECT COUNT(PNO)
+   FROM PART;
+\end{verbatim}
+and get:
+\begin{verbatim}
+                           COUNT
+                          -------
+                             4
+\end{verbatim}
+\end{example}
+
+\subsubsection{Aggregation by Groups}
+SQL allows to partition the tuples of a table into groups. Then the
+aggregate operators described above can be applied to the groups
+(i.e. the value of the aggregate operator is no longer calculated over
+all the values of the specified column but over all values of a
+group. Thus the aggregate operator is evaluated individually for every
+group.) 
+\\ \\
+The partitioning of the tuples into groups is done by using the
+keywords \mbox{GROUP BY} followed by a list of attributes that define the
+groups. If we have {\tt GROUP BY $A_{1}, \ldots, A_{k}$} we partition
+the relation into groups, such that two tuples are in the same group
+if and only if they agree on all the attributes $A_{1}, \ldots,
+A_{k}$.
+\begin{example}
+If we want to know how many parts are sold by every supplier we
+formulate the query:
+\begin{verbatim}
+   SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
+   FROM SUPPLIER S, SELLS SE
+   WHERE S.SNO = SE.SNO
+   GROUP BY S.SNO, S.SNAME;
+\end{verbatim}
+and get:
+\begin{verbatim}
+                     SNO | SNAME | COUNT
+                    -----+-------+-------
+                      1  | Smith |   2
+                      2  | Jones |   1
+                      3  | Adams |   2
+                      4  | Blake |   3
+\end{verbatim}
+Now let's have a look of what is happening here: \\
+First the join of the
+tables SUPPLIER and SELLS is derived:
+\begin{verbatim}
+                  S.SNO | S.SNAME | SE.PNO
+                 -------+---------+--------
+                    1   |  Smith  |   1
+                    1   |  Smith  |   2
+                    2   |  Jones  |   4
+                    3   |  Adams  |   1
+                    3   |  Adams  |   3
+                    4   |  Blake  |   2
+                    4   |  Blake  |   3
+                    4   |  Blake  |   4
+\end{verbatim}
+Next we partition the tuples into groups by putting all tuples
+together that agree on both attributes S.SNO and S.SNAME:
+\begin{verbatim}
+                  S.SNO | S.SNAME | SE.PNO
+                 -------+---------+--------
+                    1   |  Smith  |   1
+                                  |   2
+                 --------------------------
+                    2   |  Jones  |   4
+                 --------------------------
+                    3   |  Adams  |   1
+                                  |   3
+                 --------------------------
+                    4   |  Blake  |   2
+                                  |   3
+                                  |   4
+\end{verbatim}
+In our example we got four groups and now we can apply the aggregate
+operator COUNT to every group leading to the total result of the query
+given above.
+\end{example}
+%
+ Note that for the result of a query using GROUP BY and aggregate
+operators to make sense the attributes grouped by must also appear in
+the {\it selectlist}. All further attributes not appearing in the GROUP
+BY clause can only be selected by using an aggregate function. On
+the other hand you can not use aggregate functions on attributes
+appearing in the GROUP BY clause.
+
+\subsubsection{Having}
+
+The HAVING clause works much like the WHERE clause and is used to
+consider only those groups satisfying the qualification given in the
+HAVING clause. The expressions allowed in the HAVING clause must
+involve aggregate functions. Every expression using only  plain
+attributes belongs to the WHERE clause. On the other hand every
+expression involving an aggregate function must be put to the HAVING
+clause. 
+\begin{example}
+If we want only those suppliers selling more than one part we use the
+query:
+\begin{verbatim}
+   SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
+   FROM SUPPLIER S, SELLS SE
+   WHERE S.SNO = SE.SNO
+   GROUP BY S.SNO, S.SNAME
+   HAVING COUNT(SE.PNO) > 1;
+\end{verbatim}
+and get:
+\begin{verbatim}
+                     SNO | SNAME | COUNT
+                    -----+-------+-------
+                      1  | Smith |   2
+                      3  | Adams |   2
+                      4  | Blake |   3
+\end{verbatim}
+\end{example}
+
+\subsubsection{Subqueries}
+In the WHERE and HAVING clauses the use of subqueries (subselects) is
+allowed in every place where a value is expected. In this case the
+value must be derived by evaluating the subquery first. The usage of
+subqueries extends the expressive power of SQL.
+\begin{example}
+If we want to know all parts having a greater price than the part
+named 'Screw' we use the query:
+\begin{verbatim}
+   SELECT * 
+   FROM PART 
+   WHERE PRICE > (SELECT PRICE FROM PART
+                  WHERE PNAME='Screw');
+\end{verbatim}
+The result is:
+\begin{verbatim}
+                   PNO |  PNAME  |  PRICE
+                  -----+---------+--------
+                    3  |  Bolt   |   15
+                    4  |  Cam    |   25
+\end{verbatim}
+When we look at the above query we can see
+the keyword SELECT two times. The first one at the beginning of the
+query - we will refer to it as outer SELECT - and the one in the WHERE
+clause which begins a nested query - we will refer to it as inner
+SELECT. For every tuple of the outer SELECT the inner SELECT has to be
+evaluated. After every evaluation we know the price of the tuple named
+'Screw' and we can check if the price of the actual tuple is
+greater. 
+\\ \\
+\noindent If we want to know all suppliers that do not sell any part 
+(e.g. to be able to remove these suppliers from the database) we use:
+\begin{verbatim}
+   SELECT * 
+   FROM SUPPLIER S
+   WHERE NOT EXISTS
+             (SELECT * FROM SELLS SE
+              WHERE SE.SNO = S.SNO);
+\end{verbatim}
+In our example the result will be empty because every supplier sells
+at least one part. Note that we use S.SNO from the outer SELECT within
+the WHERE clause of the inner SELECT. As described above the subquery
+is evaluated for every tuple from the outer query i.e. the value for
+S.SNO is always taken from the actual tuple of the outer SELECT. 
+\end{example}
+
+\subsubsection{Union, Intersect, Except}
+
+These operations calculate the union, intersect and set theoretic
+difference of the tuples derived by two subqueries:
+\begin{example}
+The following query is an example for UNION:
+\begin{verbatim}
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNAME = 'Jones'
+   UNION
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNAME = 'Adams';    
+\end{verbatim}
+gives the result:
+\begin{verbatim}
+                     SNO | SNAME |  CITY
+                    -----+-------+--------
+                      2  | Jones | Paris
+                      3  | Adams | Vienna
+\end{verbatim}
+Here an example for INTERSECT:
+\begin{verbatim}
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNO > 1
+   INTERSECT
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNO > 2;
+\end{verbatim}
+gives the result:
+\begin{verbatim}
+                     SNO | SNAME |  CITY
+                    -----+-------+--------
+                      2  | Jones | Paris
+\end{verbatim}
+The only tuple returned by both parts of the query is the one having $SNO=2$.
+\pagebreak
+
+\noindent Finally an example for EXCEPT:
+\begin{verbatim}
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNO > 1
+   EXCEPT
+   SELECT S.SNO, S.SNAME, S.CITY
+   FROM SUPPLIER S
+   WHERE S.SNO > 3;
+\end{verbatim}
+gives the result:
+\begin{verbatim}
+                     SNO | SNAME |  CITY
+                    -----+-------+--------
+                      2  | Jones | Paris
+                      3  | Adams | Vienna
+\end{verbatim}
+\end{example}
+%
+\subsection{Data Definition}
+\label{datadef}
+%
+There is a set of commands used for data definition included in the
+SQL language. 
+
+\subsubsection{Create Table}
+\label{create}
+The most fundamental command for data definition is the
+one that creates a new relation (a new table). The syntax of the
+CREATE TABLE command is:
+%
+\begin{verbatim}
+   CREATE TABLE 
+                ( 
+                 [,   
+                 [, ...]]);
+\end{verbatim}
+%
+\begin{example}
+To create the tables defined in figure \ref{supplier} the
+following SQL statements are used:
+\begin{verbatim}
+   CREATE TABLE SUPPLIER
+                (SNO   INTEGER,
+                 SNAME VARCHAR(20),
+                 CITY  VARCHAR(20));
+   
+   CREATE TABLE PART
+                (PNO   INTEGER,
+                 PNAME VARCHAR(20),
+                 PRICE DECIMAL(4 , 2));
+\end{verbatim}
+\begin{verbatim}
+   CREATE TABLE SELLS
+                (SNO INTEGER,
+                 PNO INTEGER);
+\end{verbatim}
+\end{example}
+
+%
+\subsubsection{Data Types in SQL}
+The following is a list of some data types that are supported by SQL:
+\begin{itemize}
+\item INTEGER: signed fullword binary integer (31 bits precision).
+\item SMALLINT: signed halfword binary integer (15 bits precision).
+\item DECIMAL ($p \lbrack,q\rbrack $): signed packed decimal number of $p$
+digits precision with assumed $q$ of them right to the decimal
+point. $(15\ge p \ge q \ge 0)$. If $q$ is omitted it is assumed to be 0.
+\item FLOAT: signed doubleword floating point number.
+\item CHAR($n$): fixed length character string of length $n$.
+\item VARCHAR($n$): varying length character string of maximum length
+$n$.
+\end{itemize}
+
+\subsubsection{Create Index}
+Indices are used to speed up access to a relation. If a relation $R$
+has an index on attribute $A$ then we can retrieve all tuples $t$
+having $t(A) = a$ in time roughly proportional to the number of such
+tuples $t$ rather than in time proportional to the size of $R$.
+
+To create an index in SQL the CREATE INDEX command is used. The syntax
+is:
+\begin{verbatim}
+   CREATE INDEX  
+   ON  (  );
+\end{verbatim}
+%
+\begin{example}
+To create an index named I on attribute SNAME of relation SUPPLIER
+we use the following statement:
+\begin{verbatim}
+   CREATE INDEX I
+   ON SUPPLIER (SNAME);
+\end{verbatim}
+\end{example}
+%
+The created index is maintained automatically, i.e.\ whenever a new tuple
+is inserted into the relation SUPPLIER the index I is adapted. Note
+that the only changes a user can percept when an index is present
+are an increased speed.
+
+\subsubsection{Create View}
+A view may be regarded as a {\it virtual table}, i.e.\ a table that
+does not {\it physically} exist in the database but looks to the user
+as if it did. By contrast, when we talk of a {\it base table} there is
+really a physically stored counterpart of each row of the table
+somewhere in the physical storage.
+
+Views do not have their own, physically separate, distinguishable
+stored data. Instead, the system stores the {\it definition} of the
+view (i.e.\ the rules about how to access physically stored {\it base
+tables} in order to materialize the view) somewhere in the {\it system
+catalogs} (see section \ref{catalogs} {\it System Catalogs}). For a
+discussion on different techniques to implement views refer to section
+\ref{view_impl} {\it Techniques To Implement Views}.
+
+In SQL the CREATE VIEW command is used to define a view. The syntax
+is:
+\begin{verbatim}
+   CREATE VIEW 
+   AS 
+\end{verbatim}
+where {\tt $<$select\_stmt$>$ } is a valid select statement as defined
+in section \ref{select}. Note that the {\tt $<$select\_stmt$>$ } is
+not executed when the view is created. It is just stored in the {\it
+system catalogs} and is executed whenever a query against the view is
+made.
+\begin{example} Let the following view definition be given (we use
+the tables from figure \ref{supplier} {\it The suppliers and parts
+database} again):
+\begin{verbatim}
+   CREATE VIEW London_Suppliers
+      AS SELECT S.SNAME, P.PNAME
+         FROM SUPPLIER S, PART P, SELLS SE
+         WHERE S.SNO = SE.SNO AND
+               P.PNO = SE.PNO AND
+               S.CITY = 'London';
+\end{verbatim}
+Now we can use this {\it virtual relation} {\tt London\_Suppliers} as
+if it were another base table:
+\begin{verbatim}
+   SELECT *
+   FROM London_Suppliers
+   WHERE P.PNAME = 'Screw';
+\end{verbatim}
+will return the following table:
+\begin{verbatim}
+                       SNAME | PNAME
+                      -------+-------
+                       Smith | Screw                 
+\end{verbatim}
+To calculate this result the database system has to do a {\it hidden}
+access to the base tables SUPPLIER, SELLS and PART first. It
+does so by executing the query given in the view definition against
+those base tables. After that the additional qualifications (given in the
+query against the view) can be applied to obtain the resulting table.
+\end{example}
+
+\subsubsection{Drop Table, Drop Index, Drop View}
+To destroy a table (including all tuples stored in that table) the
+DROP TABLE command is used:
+\begin{verbatim}
+   DROP TABLE ;
+\end{verbatim}
+%
+\begin{example}
+To destroy the SUPPLIER table use the following statement:
+\begin{verbatim}
+   DROP TABLE SUPPLIER;
+\end{verbatim}
+\end{example}
+%
+The DROP INDEX command is used to destroy an index:
+\begin{verbatim}
+   DROP INDEX ;
+\end{verbatim}
+%
+Finally to destroy a given view use the command DROP VIEW:
+\begin{verbatim}
+   DROP VIEW ;
+\end{verbatim}
+
+\subsection{Data Manipulation}
+%
+\subsubsection{Insert Into}
+Once a table is created (see section \ref{create}), it can be filled
+with tuples using the command INSERT INTO. The syntax is:
+\begin{verbatim}
+   INSERT INTO  ( 
+                             [,  [,...]])
+   VALUES ( 
+           [,  [, ...]]);
+\end{verbatim}
+%
+\begin{example}
+To insert the first tuple into the relation SUPPLIER of figure
+\ref{supplier} {\it The suppliers and parts database} we use the
+following statement:
+\begin{verbatim}
+   INSERT INTO SUPPLIER (SNO, SNAME, CITY)
+   VALUES (1, 'Smith', 'London');
+\end{verbatim}
+%
+To insert the first tuple into the relation SELLS we use:
+\begin{verbatim}
+   INSERT INTO SELLS (SNO, PNO)
+   VALUES (1, 1);
+\end{verbatim}
+\end{example}
+
+\subsubsection{Update}
+To change one or more attribute values of tuples in a relation the
+UPDATE command is used. The syntax is:
+\begin{verbatim}
+   UPDATE 
+   SET  =  
+       [, ... [,  = ]]
+   WHERE ;
+\end{verbatim}
+%
+\begin{example}
+To change the value of attribute PRICE of the part 'Screw' in the
+relation PART we use:
+\begin{verbatim}
+   UPDATE PART
+   SET PRICE = 15
+   WHERE PNAME = 'Screw';
+\end{verbatim}
+The new value of attribute PRICE of the tuple whose name is 'Screw' is
+now 15.
+\end{example}
+
+\subsubsection{Delete}
+To delete a tuple from a particular table use the command DELETE
+FROM. The syntax is:
+\begin{verbatim}
+   DELETE FROM 
+   WHERE ;
+\end{verbatim}
+\begin{example}
+To delete the supplier called 'Smith' of the table SUPPLIER the
+following statement is used:
+\begin{verbatim}
+   DELETE FROM SUPPLIER
+   WHERE SNAME = 'Smith';
+\end{verbatim}
+\end{example}
+%
+\subsection{System Catalogs}
+\label{catalogs}
+In every SQL database system {\it system catalogs} are used to keep
+track of which tables, views indexes etc. are defined in the
+database. These system catalogs can be queried as if they were normal
+relations. For example there is one catalog used for the definition of
+views. This catalog stores the query from the view definition. Whenever
+a query against a view is made, the system first gets the {\it
+view-definition-query} out of the catalog and materializes the view
+before proceeding with the user query (see section \ref{view_impl}
+{\it Techniques To Implement Views} for a more detailed
+description). For more information about {\it system catalogs} refer to
+\cite{date}.
+
+\subsection{Embedded SQL}
+
+In this section we will sketch how SQL can be embedded into a host
+language (e.g.\ C). There are two main reasons why we want to use SQL
+from a host language:
+%
+\begin{itemize}
+\item There are queries that cannot be formulated using pure SQL
+(i.e. recursive queries). To be able to perform such queries we need a
+host language with a greater expressive power than SQL.
+\item We simply want to access a database from some application that
+is written in the host language (e.g.\ a ticket reservation system
+with a graphical user interface is written in C and the information
+about which tickets are still left is stored in a database that can be
+accessed using embedded SQL).
+\end{itemize}
+%
+A program using embedded SQL in a host language consists of statements
+of the host language and of embedded SQL (ESQL) statements. Every ESQL
+statement begins with the keywords EXEC SQL. The ESQL statements are
+transformed to statements of the host language by a {\it precompiler}
+(mostly calls to library routines that perform the various SQL
+commands). 
+
+When we look at the examples throughout section \ref{select} we
+realize that the result of the queries is very often a set of
+tuples. Most host languages are not designed to operate on sets so we
+need a mechanism to access every single tuple of the set of tuples
+returned by a SELECT statement. This mechanism can be provided by
+declaring a {\it cursor}. After that we can use the FETCH command to
+retrieve a tuple and set the cursor to the next tuple.
+\\ \\
+For a detailed discussion on embedded SQL refer to \cite{date},
+\cite{date86} or \cite{ullman}.
+
+
author	Thomas G. Lockhart
	Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)
committer	Thomas G. Lockhart
	Tue, 19 Jan 1999 16:09:16 +0000 (16:09 +0000)