Thursday, August 19, 2010

An SQL Journey: From Non-Procedural to Procedural

The simplicity of non-procedural languages is really enticing. Non-procedural languages relate to describing the result instead of defining the process. For example, when I visit a restaurant I ask what I want instead of describing how my meal should be cooked. Most of the time, we are asking what we need, instead of describing how we want a particular result to be delivered. We leave the how to experts. So we let the cook decide how to cook the food, allow Google to figure out how to do our Internet search and leave it to our car mechanic to figure out how to fix the car. This allows development of deep expertise in each area. This is the reason we have financial experts, who don't know much about cars, doctors who don't know much about computers and computer engineers who don't know much about the pedagogy of the oppressed. To a certain extent object-oriented languages have been able to mimic this real-world paradigm by defining operations and hiding the "know-how" as encapsulated modules. But all object-oriented languages are still procedural languages, since the programmers use flow controls to define the events and actions.

A non-procedural language defines what is being expected, therefore non-procedural languages don't have flow control.

Let me give you an example:

SELECT FirstName, LastName
FROM STUDENTS_TABLE
WHERE TEACHER = "Peter Drucker"

Here I have defined the result that I'm expecting. Give me the first name and last name of students whose teacher is Peter Drucker. I can get more granular and add course too. For instance,

SELECT FirstName, LastName
FROM STUDENTS_TABLE
WHERE TEACHER = "Peter Drucker"
AND COURSE = "Management 301"

This is a purely non-procedural statement, since nowhere in the statement have I defined how the result should be obtained. Both these statements are valid SQL statements.

When SQL started out, it was based on solid mathematical underpinnings of relational algebra and it was truly non-procedural. However, there has been an unfortunate evolution of SQL. Slowly but surely, more and more procedural constructs have been added to SQL. For example, addition of a CASE statement in SQL-92 was definitely procedural. Vendors have added their own flow control. Sybase added "stored procedures". Rest of the vendors including Oracle (Of course, Microsoft had bought the Sybase source code) quickly followed the lead by adding their own procedures. I think Teradata tried to maintain the purity and elegance of non-procedural SQL for a long time. Teradata had its own vested interest. It was very hard to build a procedural language on a "massively parallel processing" (MPP) platform. Therefore, Teradata stayed true to non-procedural operations of relational algebra for quite some time.

Procedural additions to non-procedural SQL have caused a havoc in programming. Initially, it was impossible to debug procedural SQL due to lack of breakpoints. Companies wasted millions of hours of programming time trying to debug procedural SQL. Of course, I must say that debugging of original non-procedural SQL with multiple sub-queries and correlated sub-queries was equally painful. But technology is always seductive. Technology is the ultimate silver bullet to kill the corporate vampires. Therefore, procedural SQL or stored procedures were widely adopted. Under a newly emerging client/server computing architecture, you could write all your business rules in procedural SQL, store them on the server with the database and minimize the network latency when "client" computers made a request for data. This was the end of non-procedural SQL.

To a certain extent non-procedural SQL was doomed due to its one of its major weaknesses: Purity of logic. Its elegance was its failure. It was just too perfect. Good logic is neither common sense, nor intuitive. I can give you some examples, which you can test in your copious free time:

Example: There are a bunch of car manufacturers, who make multiple brands of cars. Each manufacturer can make one or more brands. Car customers purchase one or more cars each year. Pretty simple, eh! Now, can you write a single (not multi-pass) SQL statement to find all the customers who have purchased all the cars made by a certain manufacturer?

Since I'm having trouble going to sleep and I can't think of anything better, I've setup a simple Microsoft Access database for you.

Here is the download link:

Click here to download Car Database as a Zip file from Google docs

or

Click here to download Car Database as a Zip file
from Box.net

If you don't have Microsoft Access and want to use another database, I've added an Excel Spreadsheet with multiple worksheets corresponding to database tables. The database has five tables as follows: CAR, CUSTOMER, MANUFACTURER, CUSTOMER_CAR, MANUFACTURER_CAR.

This is a database in 3rd normal form, which means that data redundancy has been reduced to downright minimum. CAR table has CarName and CarID, CUSTOMER table has CustomerID, cFirstName and cLastName, MANUFACTURER table has ManufacturerID and ManufacturerName. Such tables are sometimes called lookup tables. Now CUSTOMER_CAR table establishes a relationship between customers and cars. It shows which customer purchased which brand of car in which year. CUSTOMER_CAR table has CustomerID, CarID and BuyYear. Similarly, MANUFACTURER_CAR table establishes relationship between manufacturers and car brands. It shows which manufacturer produces which brand of car. Therefore, MANUFACTURER_CAR table has ManufacturerID, CarID and ReleaseYear.

In order to understand why I called non-procedural SQL as too perfect, try to write a non-procedural single-pass SQL statement that will return

(a) first and last names of all customers who have purchased all the cars released in 1974

(b) first and last names of all customers who have purchased all cars manufactured by Chrysler

(c) first and last names of all customers who have purchased all cars manufactured by Ford

(d) first and last names of all customers who have purchased all cars manufactured either by Chrysler or by Ford - If you got to this point using Microsoft Access, you will know exactly why it has a weak SQL engine.

You can use sub-querries and correlated sub-queries but no procedural or multi-pass SQL.

I guess I'm about to fall asleep now. Therefore, I'll write the solutions in my blog next week.

Until then enjoy these brain teasers!

2 comments:

  1. The download link is not operational

    ReplyDelete
  2. Here is another link to download this file:

    http://www.box.net/shared/tbybkv7ht3

    ReplyDelete