Fred Brack
Raleigh, NC

The Rexx PARSE Instruction

by Fred Brack
Last Updated

Introduction

The dictionary definition of "parse" is typically defined in terms of breaking down sentence structure. For our purposes, I like the definition found at the Computer Hope website:

To parse data or information means to break it down into component parts so that its syntax can be analyzed, categorized, and understood.

In other words, in most cases we parse a line of data to break it down into its component parts, which in database (DB) terminology would be called fields. In Rexx terms, we generally use PARSE to break down a line of data into individual variables. So if you think of a line in a file (a "record" in DB terminology) as a series of fields, we can use PARSE to access the individual fields in our program. We may also use PARSE to operate on a single variable in our program to further analyze its components, such as separating "July 21, 2020" into month, day, and year, or breaking down "07/21/2020" with the intent of converting it to "2020-07-21".

Unfortunately, the current Rexx manual gives very little information about PARSE. In fact, here's what is says:

"A complete description of parsing is given in chapter [not yet written]."

Thus this page ...

Formal Definition of Rexx PARSE

    PARSE [  option ] [ CASELESS ] 
 
type [  
template ]

    where:

     
option = { UPPER | LOWER }

     

type = { ARG | LINEIN | PULL | SOURCE | VERSION | VALUE [ expr ] WITH | VAR symbol }

So based on the above, only "type" is required (in some cases). Here is the definition of the other operands.

CASELESS means that evaluation of whatever is to be parsed is done without regard to case
option is either UPPER or LOWER, which means the input is converted prior to any parsing (default is LOWER)
type can be any of the following and is required (because it determines what PARSE does!):
- ARG is used in a subroutine to evaluate the passed operands from CALL
- LINEIN means PARSE reads a line from the standard input stream (typically a file)
- PULL means PARSE reads a line from the stack (typically the console)
- SOURCE and VERSION refer to evaluating short lines of information about the system environment (like platform and version)
- VALUE means that whatever follows (expr) will be evaluated and used as input (WITH ends the string to be evaluated). While it is a variation of VAR, the difference is that "expr" is evaluated in line, rather than simply used as-is.
- VAR, the most useful option to me, specifies a VARiable name that you want parsed
template is the definition of how you want the line broken down (this can be quite complex and is the whole reason for devoting a page to PARSE!)

A Simple Example

Note: In this and other examples, we refer to the "stream." The stream is simply where we read or write data. For input, the stream could be the console (which means your keyboard), the stack (which is an internal storage area that you control for temporary storage of one or more lines), or a file. For output, the stream could be the console or a file. That said, here is a very simple example of using PARSE:

PARSE PULL

means PULL (read) a line of input from the stream (typically the console). However, since there is no template defined, the input will be discarded when control is returned to your program. You might use this to pause a program until the user pressed the Enter key. However, a more typical use of this structure would be something like this:


SAY "Do you want to continue? (y/n)"

PARSE PULL reply

IF reply="n" THEN EXIT

We Are Not Going to Talk About ...

CASELESS, LOWER, SOURCE, VERSION, or LINEIN, because these are either obvious or not particularly useful. However, I will mention that

PARSE UPPER PULL template

is identical to:

PULL template

if you are looking for a shortcut. And I might point out that when looking for simple one-character responses like "y" or "n", you would be wise to either use UPPER or PULL and then compare against "Y" instead of "y" so you don't need to worry about whether or not the user capitalized the response.

The Template

This is the "meat" of the PARSE instruction! There are actually many options available for specifying a template. They may look complex, but if you understand the options, you can save yourself a lot of time processing data.

In its simplest form, a template specifies one or more variables into which PARSE will break down the input line. You should note that if there are more variables in the template than PARSE needs, that's OK, and the variables will be set to null. And if there are more pieces of data in the input line than you have specified variables for, whatever remains will be tacked on to the last variable in the template. Here are examples of these principles. Initially we'll be operating against this input variable definition:

    input = "Fred Brack, Raleigh, NC"
    bar = "|"

Template With Variables Only

PARSE VAR input firstname lastname city state
SAY firstname bar lastname 
bar city bar state bar
==> Fred | Brack, | Raleigh, | NC |

There are two things of note here. First, there are no blanks preceding or following any of the variable values. Second, the commas were included in the variable data. So what happened is that PARSE scanned the line for ONE OR MORE blanks (the default action) and broke it down into four pieces of data exactly as found between the blanks (or for the last variable, end of line). If there were 5 blanks before "Raleigh," in the input, the result would be the same.

If you added one or more additional variables to the end of the PARSE instruction, they would be set to null because there was no data left to match.

But what happens when you have fewer variables than words separated by blanks? Let's drop a variable (state) and see what happens:

PARSE VAR input firstname lastname city SAY firstname bar lastname bar city bar state bar ==> Fred | Brack, | Raleigh, NC | |

So since PARSE ran out of variables compared to the input line, it dumped the remaining data into the last specified variable, city, and state was untouched (we assume it was set to null to begin).

Pattern Notation

So now let's deal with those pesky commas! Keep in mind that we are simplifying things for purpose of examples, so in reality you might not be able to deal with commas as we are about to show. What we are going to do now is tell PARSE to look for those commas and exclude them from the result. Basically we are going to change the default of a blank as a PARSE search field to something else partway through the PARSE: a comma. This is called a pattern.

PARSE VAR input firstname lastname "," city "," state SAY firstname bar lastname bar city bar state bar extra bar ==> Fred | Brack | Raleigh | NC |

Well, we got rid of the commas, but we also introduced a blank in front of the city and state. That's because we overrode the default blank separator and replaced it with only a comma, which meant that PARSE assumed that the character after the comma was part of the next field up to the next delimiter. So do you see how to correct the problem? We change our delimiter to a comma followed by a blank:

PARSE VAR input firstname lastname ", " city ", " state SAY firstname bar lastname bar city bar state bar ==> Fred | Brack | Raleigh | NC |

Alternatively we could have stuck with just the comma and used a STRIP instruction to remove the blanks.

IMPORTANT NOTE: When you specify a pattern, you have two options:

Specify it in quotes (single or double) as shown above
Specify it as a variable you have set, but the variable name MUST BE ENCLOSED IN PARENS!

Thus we could alter our previous example in this manner:

delim = ", " PARSE VAR input firstname lastname (delim) city (delim) state SAY firstname bar lastname bar city bar state bar ==> Fred | Brack | Raleigh | NC |

Specifying Columns

You have just seen how we can tell PARSE to use something other than the default blank in its search for the next field. Well, you can also tell it to breakdown your line by columns. This could be useful if you created a database with each field beginning in a fixed column. Here is our data this time, with the first three fields taking up 15 characters each:


input = "Brack          Fred           Raleigh        NC" 
/* cols: 00000000011111111112222222222333333333344444444
         
12345678901234567890123456789012345678901234567

Since we know exactly where each field begins, we can specify it in our PARSE statement:

PARSE VAR input 1 lastname 16 firstname 31 city 46 state

SAY firstname bar lastname bar city bar state bar

==>
Fred            | Brack           
| Raleigh         | NC |

You can now eliminate the blanks in each field with the STRIP or SPACE function, if you wish. NOTE that the "1" is not required, as we begin in column 1 by default, but specifying it adds clarity. HOWEVER, the number 1 can look like a lowercase L (l), so you may wish to use the optional format of putting an equal sign in front of the 1 or any number, as in '=1'.

It was convenience that allowed us to just use one column number between each variable, because we wanted all the data on the line. Suppose, however, we only wanted lastname and state. Here's how we would code it:

PARSE VAR input 1 lastname 16 46 state SAY lastname bar state bar ==> Brack | NC |

So we read this template as saying: Put columns 1-15 (that's 15 characters - one less than the 16 we used) in 'lastname' and column 45 through the end of the line in 'state'. So now it becomes clearer that while the number in FRONT of a variable name is the starting column, the number AFTER the variable is one beyond the ending column (which might or might not be the start of the next field). If that number has a variable name after it, that next variable starts in that column; otherwise you have to "move along" by specifying the next starting column, which is what we did in this example with "16 46."

Fancy! You don't have to keep increasing the column number; you can "restart" the PARSE anywhere you want. For example:

So here after we extracted lastname and firstname, we backed up to column 1 and took 30 columns (31-1) for fullname (later eliminating all blanks for convenience), then moved along for city and state. It would have worked the same if that 1 fullname 31 were put anywhere else on the line, even at the end.

My Rexx pal Paul Kislanko points out an interesting use for columns on his Parallel Assignments page: setting multiple variables to the same value by resetting the template to column 1 repeatedly using the VALUE ... WITH options:

PARSE VALUE 0 WITH x 1 y 1 z

The "0" is the value for each of the variables x, y, and z.

And Chip Davis wrote in to suggest the following example which adds a new value to a stem and updates the stem index at the same time. Assume a stem named "stem." and the next value for the stem in variable "nextvalue".

PARSE VALUE stem.0+1 nextvalue WITH =1 n stem.n =1 stem.0 . ------------------ ----------- ----------- the VALUE to parse 1st pass 2nd pass (the trailing period is important)

The value to be parsed is the combination of the next stem index (stem.0+1)* concatenated with the new value to be added to the stem (nextvalue). So, for example, that might be "2 John Doe". The coding after the WITH parses that value twice (which is why "=1" appears twice, and the "=" sign helps clarify it is a one and not the letter L): first with n (which will resolve as 2 in our example), which will be the new index value for the stem to get to the spot for the next value ("John Doe"), which then appears next in line as stem.n (stem.2); and having done that, PARSE backs up to the first position of VALUE and takes the first value (2) as the new stem.0, and ignores the second value because a period is coded as the placeholder. Whew!

Chip writes: "At the very least, this atomizes a two-statement operation into a single statement (making it suitable for "Then" or "Else" clauses), removes the responsibility for indexing of the stem from the Do-loop (and thus the 'stem.0 = stem.0 - 1' that is often necessary after 'End'), and eliminates a host of problematic loop exit logic."

* Note that when writing something like stem.0+1, the math is done after the stem index is resolved; so you can think of it as (stem.0) + 1 if that makes it clearer.

Other Template Considerations

You are free to intersperse pattern and column notations in the same template if it helps.

If you want to skip something on the line, you can use a period in place of a variable name. Thus:

input = "apples pears bananas peaches lemons"
PARSE VAR input fruit1 . 
fruit3 .
SAY fruit1 fruit3
==> apples bananas

You need that period at the end to "absorb" all remaining data on the line and discard it.

If you would like a bit more geeky explanation of template options and some new ideas, see my friend Paul's page on templates!

Using PARSE to Load a Stem Variable

One of the interesting ideas Paul discusses is what he calls a "destructive parse" which can be used to load a stem (array). You create a variable consisting of all the eventual array values (lets call them fields) separated by some common delimiter (blanks are fine if the elements don't have blanks in them). In the process of parsing the variable with a 2-element template, PARSE assigns the first variable in the template (which will be the next stem slot) to the first field in the variable being parsed, while the second and last variable in the template picks up all the remaining fields; then reparse the line to get the second field assigned, etc. Thus it is easy to add and remove from the string of fields while letting the program adapt by dynamically building the stem array. You do this using the WORDS or COUNTSTR function to find the number of data elements in your string, then loop that many times for the PARSE statement to pick off the first value (field), stick it in the next stem slot, leaving the remainder of the string to process on the next pass. If your data elements contain blanks, use COUNTSTR and specify the delimiter as a pattern below.

colors = "red green blue orange yellow"
color. = ""
color.0 = WORDS(colors) 
[that would be 5]

DO i = 1 TO color.0

  
PARSE VAR colors color.i colors

END
==> leaves you with colors="", color.0=5, color.1="red", 
color.2="green", etc.

PARSE ARG

I'm going to cover this one briefly because it tripped me up at first. It looks simple, but there is a catch. You use PARSE ARG in a Subroutine to capture the arguments passed by the CALL statement. For example:

CALL MySub "Fred","Brack"

MySub:
    PARSE ARG 
firstname,lastname
    SAY firstname lastname
RETURN

That works fine. In fact, you can even put a space after the comma in either the CALL or PARSE ARG template. But if you separate the operands of CALL or the PARSE template by a blank instead of a comma, you might not get the desired results. Recommendation: always use a comma to separate operands.

Before We Leave ...

I hope this page was useful to you; but if you have suggestions for improvement, please contact me via the email address below!

Return to Introduction to Programming in the Rexx Language