1) \d refers to any digit (0 - 9),
2) \D to any non-digit, and
3) \w to any word character (A-Z, a-z, 0-9, and _).
4) The three metacharacters, *, +, and ? are particularly useful because they add quantity to a regular expression.*/
/*The below example extracts phone numbers from a string */
/*To identify a phone number use the below expressions **/
\( matches an open parenthesis ; \d\d\d matches three digits
\) matches a closed parenthesis; ? matches zero or more occurances of the previous pattern
\d\d\d matches three digits; - matches a dash
\d{4} matches four digits */
/***************************************************************/
DATA EXTRACT;
IF _N_ = 1 THEN DO;
PATTERN = PRXPARSE ("/\(\d\d\d\) ?\d\d\d-\d{4}/");
END;
RETAIN PATTERN;
LENGTH NUMBER $ 15;
INPUT STRING $CHAR80.;
CALL PRXSUBSTR(PATTERN,STRING,START,LENGTH);
IF START GT 0 THEN DO;
NUMBER = SUBSTR(STRING,START,LENGTH);
NUMBER = COMPRESS(NUMBER," ");
OUTPUT;
END;
KEEP NUMBER START LENGTH;
DATALINES;
THIS LINE DOES NOT HAVE ANY PHONE NUMBERS ON IT
THIS LINE DOES: (123)345-4567 LA DI LA DI LA
ALSO VALID (123) 999-9999
TWO NUMBERS HERE (333)444-5555 AND (800)123-4567
;
/**************************************************************/
The ‘PRXPARSE’ function is used to create regular expressions.
Syntax: PRXPARSE (perl-regular-expression)
The ‘PRXSUBSTR’ function returns the length of the match as well as the starting position.
Syntax: PRXSUBSTR(pattern-id, string, start,
Run the above code in SAS to understand how this works.
For more details visit:
http://support.sas.com/documentation/cdl/en/lrdict/59540/HTML/default/a002288677.htm
No comments:
Post a Comment