Substitute great info from Stefan Simkovics' Master's Thesis.

author Thomas G. Lockhart

Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)

committer Thomas G. Lockhart

Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)
author Thomas G. Lockhart
Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)
committer Thomas G. Lockhart
Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)
diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml

index 076952eb60a1d0e71ec0963cea2a344e69b6e7e3..c04e47cf0bf8830ac6bbc3f6d1c84daa71b49142 100644 (file)
--- a/doc/src/sgml/arch-dev.sgml
+++ b/doc/src/sgml/arch-dev.sgml
@@ -1,81 +1,4083 @@
-
-   Architecture
-
-
-<ProductName>Postgres</ProductName> Architectural Concepts
-
-
-     Before we continue, you  should  understand  the  basic
-     Postgres  system  architecture.   Understanding how the
-     parts of Postgres interact will make the  next  chapter
-     somewhat clearer.
-     In  database  jargon,  Postgres uses a simple "process  
-     per-user" client/server model.  A Postgres session 
-     consists of the following cooperating UNIX processes (programs):
-
-
-
-
-       A supervisory daemon process (postmaster),
-
-
-
-
-       the user's frontend application (e.g., the psql program), and
-
-
-
-
-       the  one or more backend database servers (the postgres process itself).
-
-
-
-
-
-
-     A single  postmaster  manages  a  given  collection  of
-     databases  on  a  single  host.   Such  a collection of
-     databases is called an installation or site.   Frontend
-     applications  that  wish  to  access  a  given database
-     within an installation make calls to the   library.
-     The library sends user requests over the network to the
-     postmaster ((a)), which in turn  starts  a  new
-     backend  server  process ((b)) 
-     
-
-How a connection is established
-
-
-
-     and connects the
-     frontend process to the new server ((c)).  From
-     that  point  on,  the  frontend process and the backend
-     server communicate without intervention by the 
-     postmaster.   Hence, the postmaster is always running, waiting
-     for requests, whereas frontend  and  backend  processes
-     come  and  go.  The libpq library allows a single 
-     frontend to make multiple connections to backend processes.
-     However,  the  frontend  application is still a 
-     single-threaded process.  Multithreaded frontend/backend  
-     connections are not currently supported in libpq.
-     One  implication of this architecture is that the 
-     postmaster and the backend always run on the  same  
-     machine (the  database  server), while the frontend 
-     application may run  anywhere.   You  should  keep  this  
-     in  mind,
-     because  the  files  that  can  be accessed on a client
-     machine may not be accessible (or may only be  accessed
-     using  a  different  filename)  on  the database server
-     machine.
-     You should also be aware that the postmaster and  
-     postgres  servers  run  with  the  user-id  of the Postgres
-     "superuser."  Note that the Postgres superuser does not
-     have  to  be  a special user (e.g., a user named 
-     "postgres").  Furthermore,  the  Postgres  superuser  
-     should
-     definitely  not  be the UNIX superuser, "root"!  In any
-     case, all files relating to a database should belong to
-     this Postgres superuser.
-
-
-
+ 
+  PostgreSQL from the Developer's Point of View
+
+  
+   This chapter gives an overview of the internal structure of the
+   backend of Postgres.
+   After having read the following sections you
+   should have an idea of how a query is processed. Don't expect a
+   detailed description here (I think such a description dealing with
+   all data structures and functions used within Postgres
+   would exceed 1000
+   pages!). This chapter is intended to help understanding the general
+   control and data flow within the backend from receiving a query to
+   sending the results.
+  
+
+  
+   The Path of a Query
+
+   
+    Here we give a short overview of the stages a query has to pass in
+    order to obtain a result.
+   
+
+   
+    
+     
+      A connection from an application program to the Postgres
+      server has to be established. The application program transmits a
+      query to the server and receives the results sent back by the server.
+     
+    
+
+    
+     
+      The parser stage checks the query
+      transmitted by the application
+      program (client) for correct syntax and creates
+      a query tree.
+     
+    
+
+    
+     
+      The rewrite system takes
+      the query tree created by the parser stage and looks for
+      any rules (stored in the
+      system catalogs) to apply to 
+      the querytree and performs the
+      transformations given in the rule bodies.
+      One application of the rewrite system is given in the realization of
+      views.
+     
+
+     
+      Whenever a query against a view
+      (i.e. a virtual table) is made,
+      the rewrite system rewrites the user's query to
+      a query that accesses the base tables given in
+      the view definition instead.
+     
+    
+
+    
+     
+      The planner/optimizer takes
+      the (rewritten) querytree and creates a 
+      queryplan that will be the input to the
+      executor.
+     
+
+     
+      It does so by first creating all possible paths
+      leading to the same result. For example if there is an index on a
+      relation to be scanned, there are two paths for the
+      scan. One possibility is a simple sequential scan and the other
+      possibility is to use the index. Next the cost for the execution of
+      each plan is estimated and the
+      cheapest plan is chosen and handed back.
+     
+    
+
+    
+     
+      The executor recursively steps through
+      the plan tree and
+      retrieves tuples in the way represented by the plan.
+      The executor makes use of the
+      storage system while scanning
+      relations, performs sorts and joins,
+      evaluates qualifications and finally hands back the tuples derived.
+     
+    
+   
+
+   
+    In the following sections we will cover every of the above listed items
+    in more detail to give a better understanding on Postgres's internal
+    control and data structures.
+   
+  
+
+  
+   How Connections are Established
+
+   
+    Postgres is implemented using a simple "process per-user"
+    client/server model.  In this model there is one client process
+    connected to exactly one server process.
+    As we don't know per se 
+    how many connections will be made, we have to use a master process
+    that spawns a new server process every time a connection is
+    requested. This master process is called postmaster and 
+    listens at a specified TCP/IP port for incoming connections. Whenever
+    a request for a connection is detected the postmaster process
+    spawns a new server process called postgres. The server
+    tasks (postgres processes) communicate with each other using
+    semaphores and shared memory
+    to ensure data integrity
+    throughout concurrent data access. Figure
+    \ref{connection} illustrates the interaction of the master process 
+    postmaster the server process postgres and a client
+    application. 
+   
+
+   
+    The client process can either be the psql frontend (for
+    interactive SQL queries) or any user application implemented using
+    the libpg library. Note that applications implemented using
+    ecpg
+    (the Postgres embedded SQL preprocessor for C)
+    also use this library.
+   
+
+   
+    Once a connection is established the client process can send a query
+    to the backend (server). The query is transmitted using plain text,
+    i.e. there is no parsing done in the frontend (client). The
+    server parses the query, creates an execution plan,
+    executes the plan and returns the retrieved tuples to the client
+    by transmitting them over the established connection.
+   
+
+
+
+  
+
+  
+   The Parser Stage
+
+   
+    The parser stage consists of two parts:
+
+    
+     
+      
+       The parser defined in
+       gram.y and scan.l is
+       built using the UNIX tools yacc
+       and lex.
+      
+     
+     
+      
+       The transformation process does
+       modifications and augmentations to the data structures returned by the parser.
+      
+     
+    
+   
+
+   
+    Parser
+
+    
+     The parser has to check the query string (which arrives as
+     plain ASCII text) for valid syntax. If the syntax is correct a
+     parse tree is built up and handed back otherwise an error is
+     returned. For the implementation the well known UNIX
+     tools lex and yacc
+     are used.
+    
+
+    
+     The lexer is defined in the file
+     scan.l and is responsible
+     for recognizing identifiers,
+     the SQL keywords etc. For
+     every keyword or identifier that is found, a token
+     is generated and handed to the parser.
+    
+
+    
+     The parser is defined in the file gram.y and consists of a
+     set of grammar rules and actions
+     that are executed
+     whenever a rule is fired. The code of the actions (which
+     is actually C-code) is used to build up the parse tree.
+    
+
+    
+     The file scan.l is transformed to
+     the C-source file scan.c
+     using the program lex
+     and gram.y is transformed to
+     gram.c using yacc.
+     After these transformations have taken
+     place a normal C-compiler can be used to create the
+     parser. Never make any changes to the generated C-files as they will
+     be overwritten the next time lex
+     or yacc is called.
+
+     
+      
+       The mentioned transformations and compilations are normally done
+       automatically using the makefiles
+       shipped with the Postgres
+       source distribution.
+      
+     
+    
+
+    
+     A detailed description of yacc or
+     the grammar rules given
+     in gram.y would be beyond the scope of this paper. There are
+     many books and documents dealing with lex
+     and yacc. You
+     should be familiar with yacc before you start to study the
+     grammar given in gram.y otherwise you won't understand what
+     happens there.
+    
+
+    
+     For a better understanding of the data structures used in Postgres
+     for the processing of a query we use an example to illustrate the
+     changes made to these data structures in every stage.
+    
+
+    
+     A Simple Select
+
+     
+      This example contains the following simple query that will be used in
+      various descriptions and figures throughout the following
+      sections. The query assumes that the tables given in
+      The Supplier Database
+      
+      have already been defined.
+
+      
+  select s.sname, se.pno
+  from supplier s, sells se
+  where s.sno > 2 and
+        s.sno = se.sno;
+      
+     
+    
+
+    
+     Figure \ref{parsetree} shows the parse tree built by the
+     grammar rules and actions given in gram.y for the query
+     given in  
+     (without the operator tree for
+     the where clause which is shown in figure \ref{where_clause}
+     because there was not enough space to show both data structures in one
+     figure).
+    
+
+    
+     The top node of the tree is a SelectStmt node. For every entry
+     appearing in the from clause of the SQL query a RangeVar
+     node is created holding the name of the alias and a pointer to a
+     RelExpr node holding the name of the relation. All
+     RangeVar nodes are collected in a list which is attached to the field
+     fromClause of the SelectStmt node.
+    
+
+    
+     For every entry appearing in the select list of the SQL query a
+     ResTarget node is created holding a pointer to an Attr
+     node. The Attr node holds the relation name of the entry and
+     a pointer to a Value node holding the name of the
+     attribute.
+     All ResTarget nodes are collected to a list which is
+     connected to the field targetList of the SelectStmt node.
+    
+
+    
+     Figure \ref{where_clause} shows the operator tree built for the
+     where clause of the SQL query given in example
+     
+     which is attached to the field
+     qual of the SelectStmt node. The top node of the
+     operator tree is an A_Expr node representing an AND
+     operation. This node has two successors called lexpr and
+     rexpr pointing to two subtrees. The subtree attached to
+     lexpr represents the qualification s.sno > 2 and the one
+     attached to rexpr represents s.sno = se.sno. For every
+     attribute an Attr node is created holding the name of the
+     relation and a pointer to a Value node holding the name of the
+     attribute. For the constant term appearing in the query a
+     Const node is created holding the value.
+    
+
+
+
+   
+
+   
+     Transformation Process
+
+    
+     The transformation process takes the tree handed back by
+     the parser as input and steps recursively through it.  If
+     a SelectStmt node is found, it is transformed
+     to a Query
+     node which will be the top most node of the new data structure. Figure
+     \ref{transformed} shows the transformed data structure (the part
+     for the transformed where clause is given in figure
+     \ref{transformed_where} because there was not enough space to show all
+     parts in one figure).
+    
+
+    
+     Now a check is made, if the relation names in the
+     FROM clause are known to the system. For every relation name
+     that is present in the system catalogs a RTE node is
+     created containing the relation name, the alias name and
+     the relation id. From now on the relation ids are used to
+     refer to the relations given in the query. All RTE nodes
+     are collected in the range table entry list which is connected
+     to the field rtable of the Query node. If a name of a
+     relation that is not known to the system is detected in the query an
+     error will be returned and the query processing will be aborted.
+    
+
+    
+     Next it is checked if the attribute names used are 
+     contained in the relations given in the query. For every
+     attribute} that is found a TLE node is created holding a pointer
+     to a Resdom node (which holds the name of the column) and a
+     pointer to a VAR node. There are two important numbers in the
+     VAR node. The field varno gives the position of the
+     relation containing the current attribute} in the range
+     table entry list created above. The field varattno gives the
+     position of the attribute within the relation. If the name
+     of an attribute cannot be found an error will be returned and
+     the query processing will be aborted.
+    
+
+
+
+  
+
+  
+   The <productname>Postgres</productname> Rule System
+
+   
+    Postgres supports a powerful
+    rule system for the specification
+    of views and ambiguous view updates.
+    Originally the Postgres
+    rule system consisted of two implementations:
+
+    
+     
+      
+       The first one worked using tuple level processing and was
+       implemented deep in the executor. The rule system was
+       called whenever an individual tuple had been accessed. This
+       implementation was removed in 1995 when the last official release
+       of the Postgres project was transformed into 
+       Postgres95. 
+      
+     
+
+     
+      
+       The second implementation of the rule system is a technique
+       called query rewriting.
+       The rewrite system} is a module
+       that exists between the parser stage and the
+       planner/optimizer. This technique is still implemented.
+      
+     
+    
+   
+
+   
+    For information on the syntax and creation of rules in the
+    Postgres system refer to
+    The PostgreSQL User's Guide.
+   
+
+   
+    The Rewrite System
+
+    
+     The query rewrite system is a module between
+     the parser stage and the planner/optimizer. It processes the tree handed
+     back by the parser stage (which represents a user query) and if
+     there is a rule present that has to be applied to the query it
+     rewrites the tree to an alternate form.
+    
+
+    
+     Techniques To Implement Views
+
+     
+      Now will sketch the algorithm of the query rewrite system. For
+      better illustration we show how to implement views using rules
+      as an example.
+     
+
+     
+      Let the following rule be given:
+
+      
+  create rule view_rule
+  as on select 
+  to test_view
+  do instead
+     select s.sname, p.pname
+     from supplier s, sells se, part p
+     where s.sno = se.sno and
+           p.pno = se.pno;   
+      
+     
+
+     
+      The given rule will be fired whenever a select
+      against the relation test_view is detected. Instead of
+      selecting the tuples from test_view the select statement
+      given in the action part of the rule is executed.
+     
+
+     
+      Let the following user-query against test_view be given:
+
+      
+  select sname 
+  from test_view
+  where sname <> 'Smith';
+      
+     
+
+     
+      Here is a list of the steps performed by the query rewrite
+      system whenever a user-query against test_view appears. (The
+      following listing is a very informal description of the algorithm just
+      intended for basic understanding. For a detailed description refer
+      to ).
+     
+
+     
+      <literal>test_view</literal> Rewrite
+      
+       
+   Take the query given in the action part of the rule.
+       
+      
+
+      
+       
+   Adapt the targetlist to meet the number and order of
+   attributes given in the user-query.
+       
+      
+
+      
+       
+   Add the qualification given in the where clause of the
+   user-query to the qualification of the query given in the
+   action part of the rule.
+       
+      
+     
+
+     
+      Given the rule definition above, the user-query will be
+      rewritten to the following form (Note that the rewriting is done on
+      the internal representation of the user-query handed back by the
+      parser stage but the derived new data structure will represent the following
+      query):
+
+      
+  select s.sname
+  from supplier s, sells se, part p
+  where s.sno = se.sno and
+        p.pno = se.pno and
+        s.sname <> 'Smith';
+      
+     
+   
+  
+
+  
+   Planner/Optimizer
+
+   
+    The task of the planner/optimizer is to create an optimal
+    execution plan. It first combines all possible ways of
+    scanning and joining
+    the relations that appear in a
+    query. All the created paths lead to the same result and it's the
+    task of the optimizer to estimate the cost of executing each path and
+    find out which one is the cheapest.
+   
+
+   
+    Generating Possible Plans
+
+    
+     The planner/optimizer decides which plans should be generated
+     based upon the types of indices defined on the relations appearing in
+     a query. There is always the possibility of performing a
+     sequential scan on a relation, so a plan using only
+     sequential scans is always created. Assume an index is defined on a
+     relation (for example a B-tree index) and a query contains the
+     restriction
+     relation.attribute OPR constant. If
+     relation.attribute happens to match the key of the B-tree
+     index and OPR is anything but '<>' another plan is created using
+     the B-tree index to scan the relation. If there are further indices
+     present and the restrictions in the query happen to match a key of an
+     index further plans will be considered.
+    
+
+    
+     After all feasible plans have been found for scanning single
+     relations, plans for joining relations are created. The
+     planner/optimizer considers only joins between every two relations for
+     which there exists a corresponding join clause (i.e. for which a
+     restriction like where rel1.attr1=rel2.attr2 exists) in the
+     where qualification. All possible plans are generated for every
+     join pair considered by the planner/optimizer. The three possible join
+     strategies are:
+
+     
+      
+       
+   nested iteration join: The right relation is scanned
+   once for every tuple found in the left relation. This strategy
+   is easy to implement but can be very time consuming.
+       
+      
+
+      
+       
+   merge sort join: Each relation is sorted on the join
+   attributes before the join starts. Then the two relations are
+   merged together taking into account that both relations are
+   ordered on the join attributes. This kind of join is more
+   attractive because every relation has to be scanned only once.
+       
+      
+
+      
+       
+   hash join: the right relation is first hashed on its
+   join attributes. Next the left relation is scanned and the
+   appropriate values of every tuple found are used as hash keys to
+   locate the tuples in the right relation.
+       
+      
+     
+    
+   
+
+   
+    Data Structure of the Plan
+
+    
+     Here we will give a little description of the nodes appearing in the
+     plan. Figure \ref{plan} shows the plan produced for the query in
+     example \ref{simple_select}.
+    
+
+    
+     The top node of the plan is a MergeJoin node which has two
+     successors, one attached to the field lefttree and the second
+     attached to the field righttree. Each of the subnodes represents
+     one relation of the join. As mentioned above a merge sort
+     join requires each relation to be sorted. That's why we find
+     a Sort node in each subplan. The additional qualification given
+     in the query (s.sno > 2) is pushed down as far as possible and is
+     attached to the qpqual field of the leaf SeqScan node of
+     the corresponding subplan.
+    
+
+    
+     The list attached to the field mergeclauses of the
+     MergeJoin node contains information about the join attributes.
+     The values 65000 and 65001
+     for the varno fields in the
+     VAR nodes appearing in the mergeclauses list (and also in the
+     targetlist) mean that not the tuples of the current node should be
+     considered but the tuples of the next "deeper" nodes (i.e. the top
+     nodes of the subplans) should be used instead.
+    
+
+    
+     Note that every Sort and SeqScan node appearing in figure
+     \ref{plan} has got a targetlist but because there was not enough space
+     only the one for the MergeJoin node could be drawn.
+    
+
+    
+     Another task performed by the planner/optimizer is fixing the
+     operator ids in the Expr
+     and Oper nodes. As
+     mentioned earlier, Postgres supports a variety of different data
+     types and even user defined types can be used. To be able to maintain
+     the huge amount of functions and operators it is necessary to store
+     them in a system table. Each function and operator gets a unique
+     operator id. According to the types of the attributes used
+     within the qualifications etc., the appropriate operator ids
+     have to be used.
+    
+   
+  
+
+  
+   Executor
+
+   
+    The executor takes the plan handed back by the
+    planner/optimizer and starts processing the top node. In the case of
+    our example (the query given in example \ref{simple_select}) the top
+    node is a MergeJoin node. 
+   
+
+   
+    Before any merge can be done two tuples have to be fetched (one from
+    each subplan). So the executor recursively calls itself to
+    process the subplans (it starts with the subplan attached to
+    lefttree). The new top node (the top node of the left subplan) is a
+    SeqScan node and again a tuple has to be fetched before the node
+    itself can be processed. The executor calls itself recursively
+    another time for the subplan attached to lefttree of the
+    SeqScan node.
+   
+
+   
+    Now the new top node is a Sort node. As a sort has to be done on
+    the whole relation, the executor starts fetching tuples
+    from the Sort node's subplan and sorts them into a temporary
+    relation (in memory or a file) when the Sort node is visited for
+    the first time. (Further examinations of the Sort node will
+    always return just one tuple from the sorted temporary
+    relation.)
+   
+
+   
+    Every time the processing of the Sort node needs a new tuple the
+    executor is recursively called for the SeqScan node
+    attached as subplan. The relation (internally referenced by the
+    value given in the scanrelid field) is scanned for the next
+    tuple. If the tuple satisfies the qualification given by the tree
+    attached to qpqual it is handed back, otherwise the next tuple
+    is fetched until the qualification is satisfied. If the last tuple of
+    the relation has been processed a NULL pointer is
+    returned.
+   
+
+   
+    After a tuple has been handed back by the lefttree of the
+    MergeJoin the righttree is processed in the same way. If both
+    tuples are present the executor processes the MergeJoin
+    node. Whenever a new tuple from one of the subplans is needed a
+    recursive call to the executor is performed to obtain it. If a
+    joined tuple could be created it is handed back and one complete
+    processing of the plan tree has finished. 
+   
+
+   
+    Now the described steps are performed once for every tuple, until a
+    NULL pointer is returned for the processing of the
+    MergeJoin node, indicating that we are finished.
+   
+
+  
+
+
+
+ 
+
+
author	Thomas G. Lockhart
	Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)
committer	Thomas G. Lockhart
	Sat, 13 Feb 1999 03:37:54 +0000 (03:37 +0000)