pcrepartial man page
Return to the 
index.htmlPCRE index page .
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
#SEC1PARTIAL MATCHING IN PCRE #SEC2RESTRICTED PATTERNS FOR PCRE_PARTIAL #SEC3EXAMPLE OF PARTIAL MATCHING USING PCRETEST #SEC4MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() #TOC1PARTIAL MATCHING IN PCRE In normal use of PCRE, if the subject string that is passed to
pcre_exec() or pcre_dfa_exec() matches as far as it goes, but is
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
are circumstances where it might be helpful to distinguish this case from other
cases in which there is no match.
Consider, for example, an application where a human is required to type in data
for a field with specific formatting requirements. An example might be a date
in the form 
ddmmmyy, defined by this pattern:
  ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
If the application sees the user's keystrokes one by one, and can check that
what has been typed so far is potentially valid, it is able to raise an error
as soon as a mistake is made, possibly beeping and not reflecting the
character that has been typed. This immediate feedback is likely to be a better
user interface than a check that is delayed until the entire string has been
entered.
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
option, which can be set when calling 
pcre_exec() or
pcre_dfa_exec(). When this flag is set for pcre_exec(), the return
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
during the matching process the last part of the subject string matched part of
the pattern. Unfortunately, for non-anchored matching, it is not possible to
obtain the position of the start of the partial match. No captured data is set
when PCRE_ERROR_PARTIAL is returned.
When PCRE_PARTIAL is set for 
pcre_dfa_exec(), the return code
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
subject is reached, there have been no complete matches, but there is still at
least one matching possibility. The portion of the string that provided the
partial match is set as the first matching string.
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
last literal byte in a pattern, and abandons matching immediately if such a
byte is not present in the subject string. This optimization cannot be used
for a subject string that might match only partially.
#TOC1RESTRICTED PATTERNS FOR PCRE_PARTIAL Because of the way certain internal optimizations are implemented in the
pcre_exec() function, the PCRE_PARTIAL option cannot be used with all
patterns. These restrictions do not apply when 
pcre_dfa_exec() is used.
For 
pcre_exec(), repeated single characters such as
  a{2,4}
and repeated single metasequences such as
  \d+
are not permitted if the maximum number of occurrences is greater than one.
Optional items such as \d? (where the maximum is one) are permitted.
Quantifiers with any values are permitted after parentheses, so the invalid
examples above can be coded thus:
  (a){2,4}
  (\d)+
These constructions run more slowly, but for the kinds of application that are
envisaged for this facility, this is not felt to be a major restriction.
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL (-13).
#TOC1EXAMPLE OF PARTIAL MATCHING USING PCRETEST If the escape sequence \P is present in a 
pcretest data line, the
PCRE_PARTIAL flag is used for the match. Here is a run of 
pcretest that
uses the date example quoted above:
    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
  data> 25jun04\P
   0: 25jun04
   1: jun
  data> 25dec3\P
  Partial match
  data> 3ju\P
  Partial match
  data> 3juj\P
  No match
  data> j\P
  No match
The first data string is matched completely, so 
pcretest shows the
matched substrings. The remaining four strings do not match the complete
pattern, but the first two are partial matches. The same test, using DFA
matching (by means of the \D escape sequence), produces the following output:
    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
  data> 25jun04\P\D
   0: 25jun04
  data> 23dec3\P\D
  Partial match: 23dec3
  data> 3ju\P\D
  Partial match: 3ju
  data> 3juj\P\D
  No match
  data> j\P\D
  No match
Notice that in this case the portion of the string that was matched is made
available.
#TOC1MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() When a partial match has been found using 
pcre_dfa_exec(), it is possible
to continue the match by providing additional subject data and calling
pcre_dfa_exec() again with the PCRE_DFA_RESTART option and the same
working space (where details of the previous partial match are stored). Here is
an example using 
pcretest, where the \R escape sequence sets the
PCRE_DFA_RESTART option and the \D escape sequence requests the use of
pcre_dfa_exec():
    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
  data> 23ja\P\D
  Partial match: 23ja
  data> n05\R\D
   0: n05
The first call has "23ja" as the subject, and requests partial matching; the
second call has "n05" as the subject for the continued (restarted) match.
Notice that when the match is complete, only the last part is shown; PCRE does
not retain the previously partially-matched string. It is up to the calling
program to do that if it needs to.
This facility can be used to pass very long subject strings to
pcre_dfa_exec(). However, some care is needed for certain types of
pattern.
1. If the pattern contains tests for the beginning or end of a line, you need
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
subject string for any call does not contain the beginning or end of a line.
2. If the pattern contains backward assertions (including \b or \B), you need
to arrange for some overlap in the subject strings to allow for this. For
example, you could pass the subject in chunks that were 500 bytes long, but in
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
bytes at the start of the buffer.
3. Matching a subject string that is split into multiple segments does not
always produce exactly the same result as matching over one single long string.
The difference arises when there are multiple matching possibilities, because a
partial match result is given only when there are no completed matches in a
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
been found, continuation to a new subject segment is no longer possible.
Consider this 
pcretest example:
    re> /dog(sbody)?/
  data> do\P\D
  Partial match: do
  data> gsb\R\P\D
   0: g
  data> dogsbody\D
   0: dogsbody
   1: dog
The pattern matches the words "dog" or "dogsbody". When the subject is
presented in several parts ("do" and "gsb" being the first two) the match stops
when "dog" has been found, and it is not possible to continue. On the other
hand, if "dogsbody" is presented as a single string, both matches are found.
Because of this phenomenon, it does not usually make sense to end a pattern
that is going to be matched in this way with a variable repeat.
Last updated: 28 February 2005
Copyright © 1997-2005 University of Cambridge.
Return to the 
index.htmlPCRE index page .
