Monday, July 6, 2009

SAS: Identify duplicate and nonduplicate observations in a SAS data set

We can achieve this by using DUPOUT= option on the PROC SORT statement. /* The below code creates the duplicate observations in a new data set called dups while the remaining observations stay in no_dup */
/***********************************************************/
Proc Sort data=sashelp.class out=no_dup dupout=dups nodupkey;
by age;
Run;

/***********************************************************/

SAS LOG:
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 13 observations with duplicate key values were deleted.
NOTE: The data set WORK.NO_DUP has 6 observations and 5 variables.NOTE: The data set WORK.DUPS has 13 observations and 5 variables.
Note:
The DUPOUT option is effective only when used with the NODUPKEY or NODUPREC/NODUP options. Without one of
these options, the log will show a WARNING message and the DUPOUT data set will be created with 0 records.

No comments:

Post a Comment