Fred Brack
Raleigh, NC

Using Rexx to Manage a Database Application

by Fred Brack
Last Updated

Introduction

A database is a file of related information. For our purposes, creating our own databases and not using fancy programs like SQL, typically we think of a database as consisting of records, each with identical field definitions. That is, each record typically contains of the same type of information, be it names and addresses, sports team information, movie titles, etc.; and within the record, there are fields which give us the various pieces of information we want to store about each related item, such as first name, last name, street address, or movie title, rating, genre, etc.

Using Rexx, we can create a "flat file" (an ordinary file of text) where each individual line represents one record, and we can separate the various characteristics (fields) by some delimiter which we can search for in our program to break the fields apart. How we do this is the subject of this page.

A Movie Database Example

I track movies of a certain characteristic for a website I manage, so I'll use movie titles as an example. Suppose we wish to track the following characteristic of each movie:

Title
Year Released
Rating
Genre

SORTING is an important topic that we will get into. You can see why we might like to sort by title, but we also might like to list movies by year of release or rating, for example. Just keep that in mind for now.

Historically we have "fixed length records" and "variable length records." The advantage of FIXED length is that you know exactly where each field starts and ends and how long the entire record is, while the advantage of VARIABLE length is much less space to store the entire database. A database record length is the sum of its field lengths; so if you have to allow for the longest possible value for each field, the record length (and thus database) can get really big. For example, within Netflix there is a movie titled "9" and one titled "Jim & Andy: The Great Beyond - Featuring a Very Special, Contractually Obligated Mention of Tony Clifton"! So I hope you see the advantage of variable length fields, which is what we will be using.

When you enter the fields into a database record, you need to separate them in some manner. Obviously blanks don't typically work well, unless you quote some or all fields; so it is very typical to use a tab character as a separator (though you could use any character not expected to be found in any field). So we'll go with a tab, which is represented by the hexadecimal code 09 ('09'x in Rexx terms).

Let's get our database started. It will have a filename, a place to store a record just read in, a count of the number of lines read from the database, and variable names for each field.

/* Movie database definition */
filename    = 
"movies.db"
inputline   = ""
movies      = 0
movietitle  
= ""
movieyear   = ""
movierating = ""
moviegenre  = ""

tab         = '09'x

Comments on the variables above:

I like to use all lowercase letters for variable names (and all UPPERCASE for Rexx commands). Sometime I separate "words" in a variable name with underscores, such as input_line. Others like biCapitalization (or camelCase), such as inputLine. Do what you want, but be consistent!
I'm an organizational freak, so I line up the columns! It makes code a lot easier to read.
The extension of the movies database can be anything - txt, dat, etc. - but db makes the most sense to me, especially since it is not an ordinary text file. Use what you want.
I list all the variables I plan to use throughout the program here, to help me with consistent names and as a quick reference as to what names I chose.
I'm torn between naming numeric variables as above (movies) or adding the number sign for clarity (movies#). Again, choose and stick with it. I think I wish I had started with always using the # sign!
I initialize text (string) data to null and numeric data to 0, except for literals (fixed values) like tab.
While I could have named the fields title, year, etc., I chose to prefix them with "movie" to make it crystal clear what they represent. For instance, if I just used year, would that mean the year of the movie or the current year?

Creating and Updating the Movie Database

Well ... that's a bit more complicated, so we are going to skip over it for now. But let's assume you have already created the beginnings of the database with a text editor and have manually separated the fields yourself by hitting the tab key.

A Private War(tab)2018(tab)Not Rated(tab)Drama The Hitman's Bodyguard(tab)2017(tab)R(tab)Action & Adventure Tyler Perry's a Madea Family Funeral(tab)2019(tab)PG-13(tab)Comedy

Reading the Database

How do you read a database? Typically with the LINEIN function. And while you are at it, why not increment the count of the number of lines read?

inputline = LINEIN(filename) /* ==> A Private War(tab)2018(tab)Not Rated(tab)Drama */ movies = movies+1

So now the first "record" of your database is in the variable inputline. The next step is to break it apart into its respective fields. You have two choices:

Use the POS (position) function to find the tab, then take the data in front of it as the first field and the data after it as the remaining fields to reprocess to find the next field.
Use the PARSE instruction to break the line down into all the fields at once. I choose that!

Here's how you do it. (To see all the PARSE options, see my page, The Rexx PARSE Instruction.)

PARSE VAR inputline movietitle (tab) movieyear (tab) movierating (tab) moviegenre

Yep, that's it, nice and tidy. By putting the variable tab in parens you are telling the PARSE instruction to use it as the delimiter instead of the default blank. You MUST put the variable name tab in parens, though, or alternatively spell it out each time as '09'x.

What Could Go Wrong?!

As every programmer knows, a lot of coding goes into error-checking. For example, if you are expecting a number and you plan to do math with it, it is much better to issue an error message as opposed to having the program blow up because it can't do what you want. So what could go wrong here?

In the first place, you might read a record with no tabs, in which case the entire line would go into movietitle, with the other three fields set to null. You can catch this by examining the line for at least one tab prior to the PARSE, or you can detect a null field for the other three fields. Once you read in the individual fields, you may need to error check them too. They could be blank or invalid in some manner (such as text like "tbd" instead of a 4-digit year).

When I create databases, I like the convention of being able to add COMMENTS to my database. I do this by placing an asterisk (*) in column one. I check for this on input and discard the line (via an ITERATE within a DO loop, as I show below). So you MAY wish to check for both an asterisk and a tab to determine what to do next. And while you are at it, you probably want to ignore null lines also.

What's Next?

Once you have parsed a line of input into its respective fields, you do whatever you want with them: validation, extraction, printing, storing in a stem for sorting, etc. Keep in mind that if you want to do anything with the data in an order other than it comes in, you need to sort the data; and to sort the data, you need to choose which field you want to sort on.

There are three basic sorting methods:

An internal sort, via Rexx code you import to your program. (Here's one sort...)
An external Rexx Utility subroutine (SysStemSort in the Rexx Utility Package which comes with Rexx)
The DOS SORT command (you simply call it from your program with operands, such as: 'sort' filename '/o' sortedfilename)

Since the database contains variable length fields separated by tabs, none of these approaches will work by themselves on the data as stored except for sorting by the first field, movietitle. Let's suppose you want to sort by movieyear, then movietitle within movieyear. You COULD read the file, PARSE into fields, and write the fields out to a different file in a different order: movieyear(tab)movietitle(tab)movierating(tab)moviegenre. Then SORT that file, read it back in, reparsing using the new order, and do as you please with the data. That's just one approach, just something to think about! Beyond that you are probably going to have to deal with stems and stem sorts. The Rexx Utility Package has a sort called 'RegMultiStemSort' that will deal with multiple stems simultaneously, keeping them all in sync.

A Complete Example (minus sorting)

Here then is some code that will read in our sample database and print out the results (without sorting).

/* TestMoviesDB.rexx */
filename    = 
"movies.db"
inputline   = ""
movies      = 0
movietitle  = ""
movieyear   = ""
movierating = ""
moviegenre  = ""
tab         = '09'x
SAY "Movie Title 
Database"
DO WHILE LINES(filename)>0 /* that is, until end-of-file */
   inputline = LINEIN(filename)
   IF 
LEFT(inputline,1)="*" | LEFT(inputline,1)="" THEN ITERATE
   IF POS(tab,inputline)=0 THEN DO
      SAY 
"Error, this input line missing tabs:" inputline
      ITERATE
   END
   
movies 
= movies+1
   PARSE VAR inputline movietitle (tab) movieyear (tab) 
movierating (tab) moviegenre
   /* fiddle with the fields here if you like ... 
*/
   /* store the fields here if you like, something like: 
movie.title.movies = movietitle */
   SAY movies"." movietitle "["movieyear";" 
movierating";" moviegenre"]"
END 
SAY "We processed" movies "movies 
successfully"
EXIT

Footnote to the example: I suggested above that you might store fields in the format movie.title.movies = movietitle. Let me comment further on that. This assumes you are creating a stem variable named 'movie.' and further qualifying that stem with another level which here I called 'title' (as opposed to 'year', 'rating', and 'genre', which would be alternates). The 'movies' in the example is the index into the stem -- the nth movie title. Assuming (that's a dangerous word and the point of this footnote) that you don't have a variable named 'title', it will be used as-is (that is, the word 'title') for every entry. But if you happen to define a variable named 'title', then it will change and you will be in trouble! Thus it would be better to adopt some convention in situations like this to "name" the level with a word that would not be used as a variable; for instance, 'titlexxx' or '@title' or something like that.

Updating the Database

Ah, this is, as they say ... an exercise left up to the user! It's a fun project, though, and once you write a general utility to update one database, it will be easy to adapt to a different database. Just try to use generic variable names as much as possible. Basically you will write an interface asking the user if they want to ADD, DELETE, MODIFY, or LIST data, then read the database to find the appropriate record(s) to list, modify, or delete, or add a new record to the end of the database and sort it.

Before We Leave ...

I hope this page was useful to you; but if you have suggestions for improvement, please contact me via the email address below!

Return to Introduction to Programming in the Rexx Language