Playing around with PROC TABULATE
As we have seen in a previous post, the syntax of SAS PROC TABULATE can be challenging. However, as a statistical programmer it is essential to know this SAS procedure.
There are several important statements to manipulate in PROC TABULATE. these statements are:
- VAR
- CLASS
- TABLE
In addition, in the TABLE statement you can use commas as operators.
So, let’s see that through a first example:
Let’s first create the dataset on which we will work.
data subjects ;
retain USUBJID AGE LOCATION TREATMENT ;
length TREATMENT $25 ;
input USUBJID $ AGE LOCATION $ TREATMENT $ ;
datalines;
SUBJ1 50 STOMACH TREATMENT1
SUBJ2 60 PANCREAS TREATMENT1
SUBJ3 51 STOMACH TREATMENT1
SUBJ4 55 PANCREAS TREATMENT2
SUBJ5 40 PANCREAS TREATMENT2
SUBJ6 45 STOMACH TREATMENT2
;
run;
Here is what our that’s it looks like:
You can see our first example of PROC TABULATE.
PROC TABULATE DATA=TEMP;
VAR AGE;
TABLE AGE;
RUN;
Here we have two important statements: TABLE and VAR.
All variables in the TABLE statement must appear in the VAR statement
The default descriptive statistic will be calculated on the AGE variable. That default descriptive statistic is the SUM.
Please see below the result of our first PROC TABULATE:
Now what if you want to calculate another statistic. Let’s say that you want to go calculate the mean of the variable AGE.
Please see below the PROC TABULATE syntax that we will use to do that.
PROC TABULATE DATA=subjects;
VAR AGE;
TABLE AGE*MEAN;
RUN;
… and here is your result:
TABLE AGE*MEAN;
That is the part of the syntax where we specify that we want to calculate the mean.
What if you want to calculate two statistic simultaneously?
Well… it’s not so complicated.
Once again, we must manipulate the TABLE statement.
PROC TABULATE DATA=subjects;
VAR AGE;
TABLE AGE*N AGE*MEAN;
RUN;
Notice that we had to use two asterisks in order to calculate simultaneously the two statistics.
There is however a more elegant way to formulate this syntax.
That imply using parentheses in the TABLE statement.
Please see below:
PROC TABULATE DATA=subjects;
VAR AGE;
TABLE RENT*(N MEAN);
RUN;
Up until now what we have seen are things that you can do with other SAS procedures. The specificity of PROC TABULATE resides in the CLASS statement.
By using the CLASS statement, you can produce concise table that shows values for various subgroups.
The other thing that we have to do is tell the TABULATE procedure where to put the classification variable (in this example TREATMENT) in the table. We do this by again using the asterisk operator. By adding another asterisk to the end of the TABLE statement, and following it with the variable TREATMENT, PROC TABULATE knows that TREATMENT will be used to categorize the mean values of AGE.
So, let’s illustrate that with an example.
PROC TABULATE DATA=subjects;
CLASS TREATMENT;
VAR AGE;
TABLE AGE*MEAN*TREATMENT;
RUN;
That’s great but we can make the syntax and the result a bit more elegant.
This implies to output a 2-dimensional table. For this we will make use of the comma operator.
In the TABLE statement what is on the left of the comma operator defines the rows and what is on the right of the comma operator defines the columns.
In the example below the rows are populated with the treatment variable.
The columns are populated with the descriptive statistics for AGE (N and MEAN).
PROC TABULATE DATA=subjects;
VAR AGE;
CLASS TREATMENT;
TABLE TREATMENT, AGE*N AGE*MEAN;
RUN;