Computational Research: Unix

Showing posts with label Unix. Show all posts

Saturday, March 6, 2010

Coherent Unix on a 286

Jack Dikian

ABSTRACT

Good Things Still Come In Small Packages

First came Unix Version 7 from AT&T and then, now we have Coherent from Mark Williams Co. Unix was small, simple and cheap. Unix is still small, simple and very, very cheap. This paper takes a close look at the youngest and .smallest kid on the block, and compares it with its older and bigger brothers, as well as its opponents.

This is a review of Coherent Unix. It was a long time ago, almost ten years ago in fact, when John Lions published what was a complete, annotated listing of AT&T’s Version 7 kernel in the form of two booklets that together are no larger than this issue of AUUGN. Those documents, along with the University of New South Wales’ "1980 Unix Companion" [UNSW], and "The C Programming Language" [R&K], more than anything else provided us with an extraordinary insight into Unix, the Unix philosophy, and its implementation language. For many, including myself, those documents were in a very real sense invaluable. Those documents, however, also represented something else; they were terse. The size of those notes made it possible to have them on hand virtually everywhere we went.

It is significant, therefore, that Coherent Unix is supplied with a single 1100 page manual that is on one hand very reminiscent of those early works, yet on the other borrows much from the more modem and accessible styles such as that found in Kemighan-Pike "The Unix Programming Environment" [K&P]. The Coherent manual contains all the information that the user needs to install, use and, importantly, learn Unix. The manual covers the traditional Unix sections, namely the supported commands, system calls and subroutines, as well as excellent chapters providing tutorial like presentations. These include system administration, UUCP, awk, the C language, ed, lex, the m4 macro processor, make, Micro EMACS, text formatting, the shell and yacc.

I am placing special emphasis on the quality of the supplied manual for one very important reason; this variant of Unix potentially has a very ready market niche in the way of a low-end Unix training platform. Coherent also comes into its own through its use as a cost effective UUCP node or to provide the DOS user with an alternative vista. "Coherent, A Multi-user, Multi-tasking Operating System For The IBM-PC/AT and Compatible 286 or 386 Based Computers" [Coherent]. This brief product sketch printed on the cover jacket of the manual provides the PC enthusiast with enough flavour of Unix to encourage further curiosity. Coherent Unix comes from the 13-year-old compiler vendor Mark Williams Company, based at 60 Revere Drive, Northbrook, Illinois 60062 (uunet!mwc!sales). For US$99.95 you receive a 60-day money back guarantee Unix look-like, and excellent user manual and free technical telephone support.

This system is shipped on four 3.5 inch high density floppy disks and a single copy of the Coherent system manual. A registration card contains a nine digit serial number that the install program prompts for during the installation process. The hardware requirements for this system are very modest when compared with even some DOS applications such as Microsoft Windows. The system requires an IBM AT or clone with 100% compatibility. It does not work on any of the MicroChannel platforms. One high density 3.5" or 5.25" floppy drive, a hard disk with at least 10MB free space, and a minimum of 640K RAM. The manual claims that the system will work with RLL, MFM and most ESDI disk controllers. It should also work with some SCSI host adapters. Coherent includes device drivers for line printers, HP laser printers, COM1 to COM4, RAM disks, tape drives, and the Adaptec SCSI disk controller. ESDI controllers include Ultrascope, Western Digital, and multiport from Amet, Emulex and SEFCO. I suspect, however, that you need to take a close look at exactly what is and isn’t supported. The release notes list more than 100 compatible systems, memory boards and disk controllers. The preparation and installation took me approximately two hours to complete. In theory the actual Coherent install should only take about half an hour, depending on your CPU, but if you, like me, decide to partition the disk between DOS and Unix then you will need to backup your whole disk before you commence the installation. I carried out the installation on a very old 286 clone with 640K and a 40MB disk running DOS 4 with the Gemini EGA 2.4 BIOS. The provided install program drives the user through the installation process from start to finish. It is no more difficult to install Coherent than it is to install any DOS application.

Absolutely no prior knowledge of Unix is required. By far the trickiest section of the installation is when you are asked to re-partition the hard disk. Here you can nominate how much space you wish to allocate to Coherent and DOS, as well as defining the active partition. The operating system mounted on the active partition is booted automatically on start-up. The install program copes very well with the system it is being run on, and tries very hard to prompt you with specific and helpful messages as you go. Once a partition has been allocated to Coherent, the install process bad blocks the nominated partition and makes the file system. You are now ready to reboot the system. The operating system on the active partition boots by default. If you load Coherent on the non-active partition, then you will need to press the number corresponding to the Coherent partition while the system is booting. If Coherent comes up OK, then the remaining three floppies are copied. This step takes a significant part of the overall installation process time.

Uncompressing the man pages and the spell dictionary etc. is slow. Coherent with man pages and dictionary takes up 7MB. This leaves me 13MB of user file space on the Coherent partition, and a further 20MB DOS space. I should mention that the Norton Utilities [Norton] came in very handy at this point because the data remaining on the 20MB DOS partition was almost unusable. It took only minutes for Norton to make sense of the broken directories and help repair them. Coherent Unix comes up multi-user after carrying out a rather slow (3 minutes for 7MB on 286) fsck and prompts for a login with "Coherent login:" At first you get the feeling that you are using a dumb terminal connected to a large AT&T SYS V rel 2 site./bin looks quite comprehensive. But a closer inspection soon tells you why this is the small kid on the block. No POSIX compliancy, X-Windows or NFS. The C compiler is fast, but does not support medium and large models on the 286. Source code is not included, csh is not available, no is off the shelf software. Coherent does, however, fit into 640K of memory (it can address up to 16MB) with the kernel using up a whole 77K. It does give you text formatting facilities through nroff with ms.

The manual also provides a 65 page chapter introducing nroff with very relevant examples. UUCP, as mentioned earlier, is supplied via uuinstall, uucp, uucico, uuxqt, uulog, uuname and uutouch. Once again, the large Remote Communications Utility chapter takes away a lot of the black magic from establishing uucp links. The public domain MicroEMACS is included, as is kermit. The stream editor sed, ed and elvis (vi) are well implemented. I especially found the yacc presentation and program examples quick to implement and easy to learn from. The C compiler, an assembler (for subroutines only), awk and the shell provide a well rounded development suite for training if not for developing real systems. No single platform supporting a dual operating system is complete without a data communication mechanism. Coherent provides a tar like utility called dos which allows the Coherent user to manipulate an MS-DOS file system. It can format or lable an MS-DOS file system, list the files in it, transfer files between it and Coherent or delete files from it. If you wish you can also buy a device driver toolkit for US$39.95. Yes, there are other kids on the block. However Coherent is by far the best dressed for the price. I am going to take a quick look at three other products which Coherent contends with. The first is not really an operating system, but rather a suite of layered utilities called the MKS (Mortice Kern Systems) Toolkit. MKS sits on top of DOS and provides over 100 System V commands including the Korn shell and vi.

However there are no development tools, and because of its dependence on DOS there is no multi-user/multi-tasking facilities. MKS costs $250.00. The second player is Minix (Mini Unix) from Prentice Hall, which is based on AT&T’s Version 7 and is supplied, with source, on 12 3.5" floppy disks. Minix sits on the host hardware and requires at least a 10MB partition if source is to be included. Although there is no UUCP support, Minix does feature networking, rcp and Ethernet. The third, SCO (Santa Cruz Operation) XENIX [SCO] is really a heavy weight in features and price when compared Coherent. SCO has a 198K kernel and requires at least 1 to 2MB of memory and 30MB of disk. It costs $1495.00. In conclusion, Coherent Unix from Mark Williams Co. is a truly high performance for value product. It combines the power and flexibility of Unix with the accessibility of PC based technologies. The manual is excellent and it alone is comparable to many speciality books costing many tens of dollars. The training sector is by far the most suitable environment for this product. Not only can this system be used to provide Unix concepts and training, but other areas such as C, shell, systems administration and text formatting can be mastered. The systems also lends itself as an ideal UUCP node. The Mark Williams Company claims it already has 10,000 satisfied users.., make that 10,001.

References

[UNSW] Unix Companion, 1980, University of NSW.

[R&K] Kemighan, B. & Ritchie, D. The C Programming Language. Prentice Hall, 1978.

[K&P] Kernighan, B. & Pike, R. The Unix Programming Environment. Prentice Hall, 1984.

[Coherent] Coherent Manual, 1990, Mark Williams Co.

[sco] Tech Specialist Journal, January 1991.

[Norton] Norton Utilities, Advanced Edition 4.50, 1987-1988, Peter Norton. Vol 12 No 1 22

AUUGNGN

Friday, March 5, 2010

Grammatical Extensions to the Structured Query Language SQL

Jack Dikian

ABSTRACT

The SQL+sh is an interactive front-end to Unify’s Structured Query Language (SQL). It’s main purpose is to add Csh/tenex like functionality to a vanilla query interpreter in the way of SQL.

A query history stack, ability to recall and edit previous queries as well as an interactive RECORD and FIELD name recognition and completion mechanism are a sample of the sort of enhancements SQL+sh supports. This paper presents a brief background to SQL before discussing some of the features we added to this package. Working in an environment where a significant portion of a programmer’s time is spent writing and maintaining applications software around the Unify Relational Database; any facility that simplifies database interactions must be an advantage. This database is quickly approaching the 2 G-byte mark with over 300 Mb of supporting software. Like other large database users, the overhead of database related maintenance is a significant consideration. Improvements in database related utilities greatly increases productivity as well as reliability.

One of the most powerful facilities available to the maintenance programmer in our environment is Unify’s Structured Query Language SQL. This utility is often used to interrogate as well as patch the underlying database. Adhoc SQL queries are often generated to confirm the correctness of application modules as well as serving the more simple day to day user information requirements.

A Quick Look At SQL

SQL is an english keyword orientated query language of great flexibility. It is a language that is easy enough for non-programmers to learn, yet has enough power for data processing professionals. This product was originally defined by Chamberlin and others at the IBM Research Laboratory in San Jose, California, under the brand name System R. A family of IBM products based on the System R technology was developed. These products are now generally available and are known as DB2, SQL/DS and QMF [1].

A number of other vendors have also produced systems that support SQL. SQL’s data manipulation statements typically operate on entire sets of records. For example, the select and update clauses can retrieve and modify a set of values and tables. SQL, like all relational data manipulation languages is a set-level language. For this reason, SQL is often described as a non-procedural language. The user specifies "what" data they want and not so much "how" to get it.

It is up to SQL to decide on how best to execute any particular query. It needs for example to consider which tables are being referenced in any request; the size of the tables; what indexes exits; how selective those indexes are and of course, the form of the where clause. SQL queries consist of clauses, each of which is preceded by a keyword. Examples of keywords include; select, update, delete and insert. In fact, the previous four keywords all belong to that part of SQL which is commonly referred to as the DML or Data Manipulation Language. Other optional keywords are used to control, format and operate on the various queries. Some simple examples of queries are given below:-

> select Name, Phone
> from PERSONS
> where Age > 30/

The above example illustrates the selecting or retrieving of the specified fields Name and Phone from a specified table PERSON where some specified condition is true. It is important to note that the result of the query is another table.

> select PERSON.*, COMPANY.*
> from PERSON, COMPANY
> where PERSON.PName = COMPANY.CName/

This example demonstrates the retrieving of data from two tables namely PERSON and COMPANY. We are interested in all instances of the field PName in PERSON matching the field CName in the table COMPANY. This is commonly referred to as "Joining" two or more tables. The availability of the join operation is, almost more than anything else that distinguishes relational from non-relational systems.

The SQL+sh

Our main database currently supports over a 100 tables and close to a 1000 fields. Using SQL to interrogate and manipulate data in this environment almost always requires the programmer to first browse through the Database schema listing. This is not only due to the large number of different tables and fields but is also due to UNIFY’s record and field naming conventions. The maximum length of a record name is eight characters. It is therefore impossible to create two records with the names "PROGRAMMER" and "PROGRAMME". A compromise may lead to the names "PROGMR" and "PROGME" etc. It is easy to see why the schema listing may be required in such cases. Creating tables in Unify requires the user to nominate both a short and a long field name. Short field names must begin with a letter and can be up to eight characters long. The long field names begin with a letter and can be up to sixteen characters long. It is the long name that SQL requires for carrying out queries.

The schema is used to determine or look up this long name. The schema is also used to determine relationships between tables and their corresponding fields. Editing large queries are handled by - SQL writing the last query in/tmp. The edit facility invokes a standard editor such as vi with the last query loaded in the editor buffer. The user modifies and saves the changes before using the restart clause to re-execute the query. Although this facility is useful, it is however often tedious. This is especially true when a simple typo needs to be repaired. Because only the last query is effectively saved, access to previous queries are lost unless the user explicitly saves the editor buffer to a nominated file. Interestingly, we required in SQL a similar transformation in functionality as that provided by say csh and tcsh over the bourne shell. Where tcsh provides file name recognition and completion, we required record and field name recognition and completion.

Where csh provides a history and edit facility for commands, we required, a history and edit mechanism for queries. In implementing some of the ideas found in csh and tcsh, we were able to address both the above mentioned short-commings as well as provide a much more effective user interface. Not having access to SQL source, the only other alternative in implementing the above changes was to write our own parser sitting on top of SQL. This would simply read the input stream, decide if it needs to act upon, and manipulate the history stack, carry through edit commands, expand alias’ etc and then write to SQL via a pipe. The output of SQL is not and should not be altered.

SQL+sh reads a schema description file on startup. This file is typically generated by the systems administrator by running a specially written shell script. The description file describes the database tables, there respective fields and other information such as field type and length. The shell script uses SQL to dump the relevant table, field types and names. On startup, SQL+sh looks at the environment variable DBPATH and displays the the name and address of the working database. After this point, SQL+sh enters a for-ever loop waiting for queries, internal commands and or the end clause. A new prompt including the event number is displayed. An environment variable defines the maximum history size. An internal command has been added called "Mod On/Off" which enables and disables the availability of non-passive SQL clauses. For example, after entering the command "Mod Off", such clauses as delete, update, insert are disabled or ignored. This is useful in cases where support staff use SQL to answer quick telephone queries and should not update the database inadvertently. Unlike Unix commands which are newline terminated, SQL queries often span over many lines. In fact, users of SQL are encouraged to use good formatting procedures when making SQL queries. This is in part due to the fact that quite complex SQL scripts can be written and saved for regular use. These scripts are also used to feed data to Unify’s report generator RPT. The "/" character is used to indicate the end of a query. For this reason, SQL+sh supports a modified history substitution command in the way of "!event+". This signals SQL+sh to re-execute the query beginning with the event number "event" and continue to re-execute events forward in the stack until a "/" character is encountered. All other normal history substitution commands such as "!!", "!- number", "!number" as well as "!pattern" etc have been implemented. Where a query spans many lines, SQL+sh collects together the individual clauses to echo a single event in its history stack.

Editing previous queries are handled two ways. The standard SQL procedure is to invoke the system editor with the last query loaded into the editor buffer. The edit clause facilitates this procedure. This method is still available and is usually used for editing large query texts. This method allows only the last query to be modified and executed. SQL+sh introduces the csh like "!event s/patternl/patternl" and ^patternl^pattern2^ mechanisms. These are extremely convenient for repairing typos and or for substituting record or field names while leaving the general structure of the query untouched.

One of the most useful additions to SQL was the introduction of record and field name recognition and completion. The idea here was to provide a convenient way to avoid having to look up the record and field names before generating queries. Automatically displaying field types and length was considered useful. Other considerations included providing a means by which key strokes could be reduced and accurately associating relevant field names to their correct parent tables. This mechanism is used in conjunction with the database schema description file. It is no longer necessary to type a complete record or field name. Only a unique abbreviation is necessary. Typing the ESCAPE key after the abbreviation will complete the record or field name, echoing the full name. Unlike tcsh, where there is really only one type of file name completion, SQL+sh needs to consider context and determine whether a record, or field name is being sought. This is achieved by adding some of the SQL syntax rules into SQL+sh.

For example the following grammar extracts the syntax for the insert and select clauses:-

insert into RECORD [(FIELD .... )]:
from filenamell select/ select ["unique"] I * I RECORD.* I RECORD.FIELD I FIELD ....I * I RECORD.* I RECORD.FIELD I FIELD ....
from I RECORD [label] I .... where ["not"] I FIELD I RECORD.FIELD I constant ETC.

SQL+sh tries to carry out a search of either the appropriate record or field based on the position the ESCAPE key was pressed in the input stream. It is obvious from the above two syntax examples that it is not often possible to determine whether a RECORD or a FIELD needs expanding. In the select clause for example, it is possible to say "select record.field from ..." or " select field from...". Hitting the ESCAPE key just after the select token leaves SQL+sh with a choice of searching for appropriate records or fields. In fact, in this particular example, the system will first search through the record list and then the field list. In general, as each word is read, SQL+sh updates a flag indicating whether it is in a "RECORD" or "FIELD" state. This flag is initially set to a "NULL" state thus causing an alert when the ESCAPE key is pressed. A "BOTH" state causes SQL+sh to search records and then fields. This state is established by tracking entered words against various syntax rules defined in SQL+sh. We have also provided a means of commenting query text. Text found enclosed within the "{" and "}" braces are ignored. This facility was implemented in order to allow a clean method of displaying field types and length in-line. On Hitting ESCAPE in a "FIELD" state, the system will not only display a candidate field name but also place the relevant field type and length already commented.

Besides providing a recognition and completion mechanism, SQL+sh also provides a facility where fields belonging to a particular record can be scanned. For example, after having typed in the sub-clause

select * from
PERSON where " it is possible to Hit Ctrl-f to echo the first field belonging to the PERSON record.

Hitting Ctrl-f again will replace the first displayed field name with the next field. When the list of fields are exhausted, the process is repeated. This allows the user to carry out a query on a record even when they had no idea of the field names associated with the given record. The field type and length is once again displayed in comments. Some examples follow:-

select * from PE"
select * from PERSON

results in the cursor sits at the next column position waiting for the rest of the query.

select * from PERSON where

results in
select * from PERSON where PName {STRING 12}
Hitting

again results in
select * from PERSON where Paddress {STRING 45}

The user can now enter the rest of the query

select * from PERSON where Paddress {STRING 45} = ’Bag End*’
Hitting here results again
PAge {NUMERIC 3 }
PAge {NUMERIC 3} <= 111/

Now we can enter the rest of query Often there is the need to carry out repetitive queries involving tests against large text constants such as "0 081 12346789050" and "Speak Friend And Enter". An ability to implement a concept of macros was also considered a useful enhancement. The same query is often re-executed many times over in the event of a Database maintenance session. One or more parameters in the query may however vary. An ability to expand VMS like "Logical Variables" was added to SQL+sh. The same variable setting and expansion mechanism is used to set and unset simple and complex variables. There is no inherent differences between variable substitution and macro processing. The difference is operational. SQL+sh maintains a set of variables each of which has as a value a list of zero or more words. Each word in this list could be a simple constant or another variable. This value may be displayed and changed by using the internal commands show and clear. After the input line is parsed, and before each query is executed, variable substitution is performed. Variables are keyed by ’$’ character. The expansion can be prevented by preceding the ’$’ with a ’V except within ’"s. A Macro with a single argument can be seen as a variable containing another variable in its assignment string. The second variable has to be resolved before the macro can be executed. Newline characters found in the assignment list are ignored. Looping is prevented by checking that the same variable does not appear in the assignment list of that variable.

Examples of variables follow:-

[1] $new_name = "Bilbo Baggins"
[2] $my_update = " update PERSON
[3] s

et PName = Snew_name
[4] where PName = ’ *’/"
[5] Smy_update

We have been using this utility on a trial basis for the last few weeks. In general, the added convenience of query recall and edit far exceeds the cost of overhead. The ability to echo the field length and type results in much less references made to the schema listing. Record and field name completion means less typos in general.

References

[1] C.J. Date, "An Introduction To Database Systems", Addison-Wesley 1986.e, Australia; 13th- 15th September 1988.