BLOCXX_NAMESPACE::PerlRegEx Class Reference

Perl compatible Regular Expression wrapper class and utility functions. More...

#include <PerlRegEx.hpp>

List of all members.

Public Types

typedef blocxx::Array< int > MatchVector
 Native PCRE vector of integers.
typedef blocxx::Array< match_tMatchArray
 POSIX RegEx like match array with captured substring offsets.

Public Member Functions

 PerlRegEx ()
 Create a new PerlRegEx object without compilation.
 PerlRegEx (const String &regex, int cflags=0)
 Create a new PerlRegEx object and compile the regular expression.
 PerlRegEx (const PerlRegEx &ref)
 Create a new PerlRegEx as (deep) copy of the specified reference.
 ~PerlRegEx ()
 Destroy this PerlRegEx object.
PerlRegExoperator= (const PerlRegEx &ref)
 Assign the specified PerlRegEx reference.
bool compile (const String &regex, int cflags=0)
 Compile the regular expression pattern contained in the string.
int errorCode ()
 Return the last error code generated by compile or one of the executing methods.
String errorString () const
 Return the error message string for the last error code.
String patternString () const
int compileFlags () const
bool isCompiled () const
StringArray capture (const String &str, size_t index=0, size_t count=0, int eflags=0)
 Search in string and return an array of captured substrings.
String replace (const String &str, const String &rep, bool global=false, int eflags=0)
 Replace (substitute) the first or all matching substrings.
StringArray split (const String &str, bool empty=false, int eflags=0)
 Split the specified string into an array of substrings.
StringArray grep (const StringArray &src, int eflags=0)
 Match all strings in the array against regular expression.
bool match (const String &str, size_t index=0, int eflags=0) const
 Execute regular expression matching against the string.
bool execute (MatchVector &sub, const String &str, size_t index=0, size_t count=0, int eflags=0)
 Execute regular expression matching against the string.
bool execute (MatchArray &sub, const String &str, size_t index=0, size_t count=0, int eflags=0)

Private Attributes

pcre * m_pcre
int m_flags
int m_ecode
String m_error
String m_rxstr

Classes

struct  match_t
 POSIX RegEx like structure for captured substring offset pair. More...


Detailed Description

Perl compatible Regular Expression wrapper class and utility functions.

The PerlRegEx implementation depends on avaliability of the pcre library.

Consult the pcre_compile(3), pcre_exec(3) and pcreapi(3) manual pages for informations about details of the pcre implementation.

Note:
This class does NOT wrap all features provided by the pcre library!

Definition at line 59 of file PerlRegEx.hpp.


Member Typedef Documentation

typedef blocxx::Array<int> BLOCXX_NAMESPACE::PerlRegEx::MatchVector

Native PCRE vector of integers.

It contains captured substring offsets pairs. Each even index number points to a start and odd index number the corresponding end of the matched substring.

Definition at line 67 of file PerlRegEx.hpp.

POSIX RegEx like match array with captured substring offsets.

Definition at line 76 of file PerlRegEx.hpp.


Constructor & Destructor Documentation

BLOCXX_NAMESPACE::PerlRegEx::PerlRegEx (  ) 

Create a new PerlRegEx object without compilation.

Definition at line 188 of file PerlRegEx.cpp.

BLOCXX_NAMESPACE::PerlRegEx::PerlRegEx ( const String regex,
int  cflags = 0 
)

Create a new PerlRegEx object and compile the regular expression.

Parameters:
regex A perl regular expression pattern.
cflags Bitwise-or of compile() flags.
Exceptions:
RegExCompileException on compilation failure.

Definition at line 197 of file PerlRegEx.cpp.

References BLOCXX_THROW_ERR, compile(), errorString(), and m_ecode.

BLOCXX_NAMESPACE::PerlRegEx::PerlRegEx ( const PerlRegEx ref  ) 

Create a new PerlRegEx as (deep) copy of the specified reference.

If the reference is compiled, the new object will be compiled as well.

Parameters:
ref The PerlRegEx object reference to copy.
Exceptions:
RegExCompileException on compilation failure.

Definition at line 211 of file PerlRegEx.cpp.

References BLOCXX_THROW_ERR, compile(), errorString(), m_ecode, m_flags, m_pcre, and m_rxstr.

BLOCXX_NAMESPACE::PerlRegEx::~PerlRegEx (  ) 

Destroy this PerlRegEx object.

Definition at line 225 of file PerlRegEx.cpp.

References m_pcre.


Member Function Documentation

PerlRegEx & BLOCXX_NAMESPACE::PerlRegEx::operator= ( const PerlRegEx ref  ) 

Assign the specified PerlRegEx reference.

If the reference is compiled, the current object will be (re)compiled.

Parameters:
ref The PerlRegEx object reference to assign from.
Exceptions:
RegExCompileException on compilation failure.

Definition at line 237 of file PerlRegEx.cpp.

References BLOCXX_THROW_ERR, compile(), BLOCXX_NAMESPACE::String::erase(), errorString(), m_ecode, m_error, m_flags, m_pcre, and m_rxstr.

bool BLOCXX_NAMESPACE::PerlRegEx::compile ( const String regex,
int  cflags = 0 
)

Compile the regular expression pattern contained in the string.

Parameters:
regex A regular expression pattern.
cflags Bitwise-or of compilation flags.
Returns:
True on successful compilation, false on failure.
The cflags parameter can be set to one or a bitwise-or of the following option flags. Consult the pcre_compile(3) and pcreapi(3) manual pages for the complete list and detailed description.

Most of the compile options can be set also directly in the pattern string using the (?<option character>="">) notation as listed bellow.

  • i PCRE_CASELESS match upper and lower case letters
  • m PCRE_MULTILINE the "^" and "$" matches begin and end of a line instead of the string
  • s PCRE_DOTALL dot metacharacters matches also newlines
  • x PCRE_EXTENDED ignore not escaped whitespaces
  • U PCRE_UNGREEDY invert "greediness" of quantifiers
  • PCRE_UTF8 causes to act in UTF8 mode
  • PCRE_ANCHORED force pattern to be "anchored"
  • PCRE_NO_AUTO_CAPTURE behave as if "(" parenthesis is followed by a "?:"

Definition at line 262 of file PerlRegEx.cpp.

References BLOCXX_NAMESPACE::String::c_str(), BLOCXX_NAMESPACE::String::erase(), m_ecode, m_error, m_flags, m_pcre, and m_rxstr.

Referenced by operator=(), and PerlRegEx().

int BLOCXX_NAMESPACE::PerlRegEx::errorCode (  ) 

Return the last error code generated by compile or one of the executing methods.

In case of a compile error, the returned value points to the position (character offset) in the regex pattern string, where where the error was discovered.

In all other cases, the result of the pcre_exec function call is returned.

Returns:
pcre_exec result or compile error position.

Definition at line 294 of file PerlRegEx.cpp.

References m_ecode.

String BLOCXX_NAMESPACE::PerlRegEx::errorString (  )  const

Return the error message string for the last error code.

Returns:
The error message or empty string if no expression was compiled.

Definition at line 302 of file PerlRegEx.cpp.

References m_error.

Referenced by capture(), grep(), match(), operator=(), PerlRegEx(), replace(), and split().

String BLOCXX_NAMESPACE::PerlRegEx::patternString (  )  const

Returns:
The regular expression pattern string.

Definition at line 310 of file PerlRegEx.cpp.

References m_rxstr.

int BLOCXX_NAMESPACE::PerlRegEx::compileFlags (  )  const

Returns:
The compilation flags used in compile() method.

Definition at line 318 of file PerlRegEx.cpp.

References m_flags.

bool BLOCXX_NAMESPACE::PerlRegEx::isCompiled (  )  const

Returns:
true, if the current regex object is compiled.

Definition at line 326 of file PerlRegEx.cpp.

References m_pcre.

bool BLOCXX_NAMESPACE::PerlRegEx::execute ( MatchVector sub,
const String str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

Execute regular expression matching against the string.

The matching starts at the specified index and return true on match of false if no match found.

Note:
In contrast to the PosixRegEx class, the PCRE library supports a string index (startoffset) and is able to look behind the starting point. If the regex makes use of the "start of string/line" metacharacter (^), the regex may not match if index is greater than 0.
The expected maximal number of matching substrings can be specified in count. If the default value of 0 is used, the detected count by pcre_fullinfo will be used.
Note:
If the specified count is greater 0 but smaller than the effectively number of found matches, false is returned (failure, error code 0). If the specified count is greater 0 and greater than the the effectively number of found matches, unused offsets at the end are filled with to -1.
If no match was found, the sub array will be empty and false is returned. If a match is found and the expression was compiled to capture substrings, the sub array will be filled with the captured substring offsets. The first (index 0) offset pair points to the start of the first match and the end of the last match. Unused / optional capturing subpattern offsets will be set to -1.

The resulting MatchVector is twice as large as the number of captured substrings, the resulting MatchArray equal.

Consult the pcre_exec(3) and pcreapi(3) manual pages for the complete and detailed description.

Parameters:
sub array for substring offsets
str string to match
index match string starting at index
count number of expected substring matches
eflags execution flags described bellow
Returns:
true on match or false
Exceptions:
RegExCompileException if regex is not compiled.
AssertionException if the count value is too big (would cause integer overflow).
OutOfBoundsException if the index is greater than the string length.
The eflags parameter can be set to 0 or one or a bitwise-or of the following options:

  • PCRE_NOTBOL The circumflex character (^) will not match the beginning of string.
  • PCRE_NOTEOL The dollar sign ($) will not match the end of string.
  • PCRE_ANCHORED Match only at the first position
  • PCRE_NOTEMPTY An empty string is not a valid match
  • PCRE_NO_UTF8_CHECK Do the string for UTF-8 validity (only relevant if PCRE_UTF8 was set at compile time)
Example:
 String      str("foo = bar trala hoho");

 MatchArray  vsub;
 if( PerlRegEx("=").execute(vsub, str) && !vsub.empty())
 {
   //
   // vsub[0] is 4,
   // vsub[1] is 5
   //
 }
 
 MatchArray  rsub;
 if( PerlRegEx("=").execute(rsub, str) && !rsub.empty())
 {
   //
   // rsub[0].rm_so is 4,
   // rsub[0].rm_eo is 5
   //
 }

Definition at line 404 of file PerlRegEx.cpp.

References BLOCXX_THROW, BLOCXX_NAMESPACE::String::c_str(), BLOCXX_NAMESPACE::getError(), i, BLOCXX_NAMESPACE::String::length(), m_ecode, m_error, and m_pcre.

Referenced by capture(), replace(), and split().

bool BLOCXX_NAMESPACE::PerlRegEx::execute ( MatchArray sub,
const String str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

StringArray BLOCXX_NAMESPACE::PerlRegEx::capture ( const String str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

Search in string and return an array of captured substrings.

Parameters:
str string to search in
index match string starting at index
count expected substring count
eflags execution flags, see execute()
Returns:
array of captured substrings
Exceptions:
RegExCompileException if regex is not compiled.
RegExExecuteException on execute failures.
OutOfBoundsException if the index is greater than the string length.
Example:
 String      str("Foo = bar trala hoho");
 PerlRegEx   reg("^((?i)[a-z]+)[ \t]*=[ \t]*(.*)$");
 StringArray out = reg.capture(str);
 //
 // out is { "Foo = bar trala hoho",
 //          "Foo",
 //          "bar trala hoho"
 //        }

Definition at line 473 of file PerlRegEx.cpp.

References BLOCXX_THROW, BLOCXX_THROW_ERR, errorString(), execute(), i, m_ecode, m_pcre, match(), BLOCXX_NAMESPACE::Array< T >::push_back(), and BLOCXX_NAMESPACE::String::substring().

blocxx::String BLOCXX_NAMESPACE::PerlRegEx::replace ( const String str,
const String rep,
bool  global = false,
int  eflags = 0 
)

Replace (substitute) the first or all matching substrings.

Substring(s) matching regular expression are replaced with the string provided in rep and a new, modified string is returned. If no matches are found, a copy of 'str' string is returned.

The rep string can contain capturing references "\\1" to "\\9" that will be substituted with the corresponding captured string. Prepended "\\" before the reference disables (switches to skip) the substitution. Note, the notation using double-slash followed by a digit character, not just "\1" like the "\n" escape sequence.

Parameters:
str string that should be matched
rep replacement substring with optional references
global if to replace the first or all matches
eflags execution flags, see execute() method
Returns:
new string with modification(s)
Exceptions:
RegExCompileException if regex is not compiled.
RegExExecuteException on execute failures.
OutOfBoundsException if the index is greater than the string length.
Example:
 String      str("//foo/.//bar/hoho");
 PerlRegEx   reg("([/]+(\\.?[/]+)?)");
 String      out = reg.replace(str, "/");
 //
 // out is "/foo/bar/hoho"
 //

Definition at line 518 of file PerlRegEx.cpp.

References BLOCXX_THROW, BLOCXX_THROW_ERR, BLOCXX_NAMESPACE::String::erase(), errorString(), execute(), BLOCXX_NAMESPACE::String::length(), m_ecode, m_error, m_pcre, match(), BLOCXX_NAMESPACE::substitute_caps(), and BLOCXX_NAMESPACE::String::substring().

StringArray BLOCXX_NAMESPACE::PerlRegEx::split ( const String str,
bool  empty = false,
int  eflags = 0 
)

Split the specified string into an array of substrings.

The regular expression is used to match the separators.

If the empty flag is true, empty substring are included in the resulting array.

If no separators were found, and the empty flag is true, the array will contain the input string as its only element. If the empty flag is false, a empty array is returned.

Parameters:
str string that should be splitted
empty whether to capture empty substrings
eflags execution flags, see execute() method
Returns:
array of resulting substrings or empty array on failure
Exceptions:
RegExCompileException if regex is not compiled.
RegExExecuteException on execute failures.
OutOfBoundsException if the index is greater than the string length.
Example:
 String      str("1.23, .50 , , 71.00 , 6.00");
 StringArray out1 = PerlRegEx("([ \t]*,[ \t]*)").split(str);
 //
 // out1 is { "1.23", ".50", "71.00", "6.00" }
 //

Definition at line 571 of file PerlRegEx.cpp.

References BLOCXX_THROW, BLOCXX_THROW_ERR, BLOCXX_NAMESPACE::String::empty(), BLOCXX_NAMESPACE::String::erase(), errorString(), execute(), BLOCXX_NAMESPACE::String::length(), m_ecode, m_error, m_pcre, match(), BLOCXX_NAMESPACE::Array< T >::push_back(), and BLOCXX_NAMESPACE::String::substring().

StringArray BLOCXX_NAMESPACE::PerlRegEx::grep ( const StringArray src,
int  eflags = 0 
)

Match all strings in the array against regular expression.

Returns an array of matching strings.

Parameters:
src list of strings to match
eflags execution flags, see execute() method
Exceptions:
RegExCompileException if regex is not compiled.
RegExExecuteException on execute failures.
OutOfBoundsException if the index is greater than the string length.
Example:
 StringArray src;
 src.push_back("\t");
 src.push_back("one");
 src.push_back("");
 src.push_back("two");
 src.push_back("  ");
 StringArray out = PerlRegEx("[^ \t]").grep(src);
 //
 // out is { "one", "two" }
 //

Definition at line 628 of file PerlRegEx.cpp.

References BLOCXX_NAMESPACE::Array< T >::begin(), BLOCXX_THROW, BLOCXX_THROW_ERR, BLOCXX_NAMESPACE::Array< T >::empty(), BLOCXX_NAMESPACE::Array< T >::end(), BLOCXX_NAMESPACE::String::erase(), errorString(), BLOCXX_NAMESPACE::getError(), i, m_ecode, m_error, m_pcre, and BLOCXX_NAMESPACE::Array< T >::push_back().

bool BLOCXX_NAMESPACE::PerlRegEx::match ( const String str,
size_t  index = 0,
int  eflags = 0 
) const

Execute regular expression matching against the string.

The matching starts at the specified index and return true on match of false if no match found.

See execute() method for description of the index and eflags parameters.

Parameters:
str string to match
index match string starting index
eflags execution flags, see execute() method
Returns:
true on match or false
Exceptions:
RegExCompileException if regex is not compiled.
RegExExecuteException on execute failures.
OutOfBoundsException if the index is greater than the string length.
Example:
 String      str("foo = bar ");
 if( PerlRegEx("^[a-z]+[ \t]*=[ \t]*.*$").match(str))
 {
 }

Definition at line 666 of file PerlRegEx.cpp.

References BLOCXX_THROW, BLOCXX_THROW_ERR, BLOCXX_NAMESPACE::String::c_str(), BLOCXX_NAMESPACE::String::erase(), errorString(), BLOCXX_NAMESPACE::getError(), BLOCXX_NAMESPACE::String::length(), m_ecode, m_error, and m_pcre.

Referenced by capture(), replace(), and split().


Member Data Documentation

Definition at line 446 of file PerlRegEx.hpp.

Referenced by compile(), compileFlags(), operator=(), and PerlRegEx().

int BLOCXX_NAMESPACE::PerlRegEx::m_ecode [mutable, private]

Definition at line 448 of file PerlRegEx.hpp.

Referenced by compile(), errorString(), execute(), grep(), match(), operator=(), replace(), and split().

Definition at line 449 of file PerlRegEx.hpp.

Referenced by compile(), operator=(), patternString(), and PerlRegEx().


The documentation for this class was generated from the following files:

Generated on Wed Feb 25 19:05:08 2009 for blocxx by  doxygen 1.5.6