Welcome to Telelogic Product Support
  Home Downloads Knowledgebase Case Tracking Licensing Help Telelogic Passport
Telelogic DOORS (steve huntington)
Decrease font size
Increase font size
Topic Title: Help with regular expression function
Topic Summary:
Created On: 8-Oct-2004 16:55
Status: Post and Reply
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
Quick Reply Quick Reply
Subscribe to this topic Subscribe to this topic
E-mail this topic to someone. E-mail this topic
Bookmark this topic Bookmark this topic
View similar topics View similar topics
View topic in raw text format. Print this topic.
 8-Oct-2004 16:55
User is offline View Users Profile Print this message


Louie Landale

Posts: 2070
Joined: 12-Sep-2002

I had a function "fGetOffset(InString, SubString)" that searched for SubString in every valid position of InString: for (i=0; i<length(InString)-lenght(SubString; i++); if it matches then found it.

I decided to expand my horizons and use regular expressions. Testing resulted in the new version (attached) about 4 times faster forshort strings, and 40 times faster for strings 300+ characters. It seemed to work.

But the attached demo fails in the 4th case. I suspect the regular expresssion fails because there is a "|" character in the SubString. I don't want to insert a "Regexp re = regexp(SubString)" in the function since I've shown that increases time quite a bit.

Anybody debug/explain this for me? Can I write a generic quick function like this?

- Louie
Report this to a Moderator Report this to a Moderator
 8-Oct-2004 19:51
User is offline View Users Profile Print this message


ron lewis

Posts: 650
Joined: 20-Sep-2004

The or symboll '|' is part of match syntax and is being used in that context even though you pass it as a string.
Report this to a Moderator Report this to a Moderator
 8-Oct-2004 20:31
User is offline View Users Profile Print this message


Louie Landale

Posts: 2070
Joined: 12-Sep-2002

Does that mean there's no way to do this, other than my original way of comparing the SubString to each position in the original string? Anyway of saying that SubString is to be taken litterally?

Is there some sort of "Is this a regular expression-like" function?

The "findRichText()" command has a similar problem, where only the non-rich-text codes in the search string are searched (the Raw Text), ignoring the Rich-Text codes. The "contains" function for buffers has restrictions making it unusable.

Maybe I should try putting the string in a Buffer and use the "contains Buffer Char" function, looking for each character in the SubString.

- Louie
Report this to a Moderator Report this to a Moderator
 8-Oct-2004 20:54
User is offline View Users Profile Print this message


ron lewis

Posts: 650
Joined: 20-Sep-2004

Depending upon what you want to do: 

Try \\|  in one or both of your strings.

Note that is two backslash with or
Report this to a Moderator Report this to a Moderator
 11-Oct-2004 13:15
User is offline View Users Profile Print this message


Paul Tiplady

Posts: 176
Joined: 28-Oct-2003

Escaping stuff is fun. I had to write the attached function to make sure I had all my escape sequences escaped (including the escaped escape sequences) while doing an export utility. A bit of work on this function would allow you to escape all the magic regexp characters, and thus get around the problems you're getting.

-------------------------


Paul dot Tiplady at TRW dot com
TRW Automotive
Report this to a Moderator Report this to a Moderator
 11-Oct-2004 15:23
User is offline View Users Profile Print this message


Louie Landale

Posts: 2070
Joined: 12-Sep-2002

Looks like all that defeats the purpose, which is to do this fast and quick without memory leak. Even so, I don't think your function solves the problem where a SubString of "|a" is interpreted to mean "any character or an 'a'", since the "|" character isn't turned into a "\\|" sequence.

I've tried a new technique: put the string in a buffer and then use the "contains(Buffer, FirstCharOfSubString)" to find if the first character exists; then byte-for-byte insure the 2nd-plus characters all match up. Will find out if that speeds things up or not.

- Louie
Report this to a Moderator Report this to a Moderator
 12-Oct-2004 10:45
User is offline View Users Profile Print this message


Ross Morgan

Posts: 74
Joined: 15-Apr-2004

try this function...
string convert_for_re(string sIn)
{
string sReturn = sIn
string sTemp = sIn
string sAccum = ""
string sResidual
int iLen
Regexp rNuisanceCharacters = regexp "[\\||\\)|\\]|\\[|\\(].*"
while(rNuisanceCharacters sTemp)
{
sAccum = sAccum sTemp[0start 0) - 1] "\\" sTemp[start 0:start 0]
iLen = length(sTemp)
sResidual = sTemp[(start 0) + 1:iLen-1]
sTemp = sResidual
sReturn = sAccum sResidual
}
return sReturn
}

and replace...
int Offset = fGetOffset(InString, SubString)
...with...
string sConv = convert_for_re(SubString)
int Offset = fGetOffset(InString, sConv)

you may want to use the lower() function for case independence as well.

still more efficient than using a Buffer, I'll bet.



Report this to a Moderator Report this to a Moderator
 12-Oct-2004 16:24
User is offline View Users Profile Print this message


Louie Landale

Posts: 2070
Joined: 12-Sep-2002

I found that the "Regexp re= regexp string" function is EXTREMELY slow and should only be used in repeated searches. In your case, move that line outside the function and make it a global declaration.

My new function using "contains(Buff, char FirstOfSubString)" seems to be working as fast as my older attempt using regular expr: (if (SubString, InString) then match(0)...), but also works when SubString contains Regexp fields as well as RichText fields (the Achilies heal of function "findRichText").

Attach find the function. Not shown are a variety of overloaded functions that support string, matching case. The StartPosition parameter is so I can write a psuedo-overloaded "fGetOffsetLast" function, that finds the last occurance of SubString.

I also found that creating and deleting buffers is pretty slow. So the overloaded ones make use of a private global buffer, used to store the value when the user calls fGetOffset with a Input string, which is routinely the case.

Yes, non-case-matching involves turning both the Input and Substring to "lower" before calling the function. Its possible that function 'cistrcmp' might be a little faster.

Anyway, this function is about 30 times faster than my old one when the InString is 100 characters and the SubString is at the end (or missing).

Thanks for the help.

Hard to believe DXL doesn't have such a parm already defined.

- Louie
Report this to a Moderator Report this to a Moderator
Statistics
20925 users are registered to the Telelogic DOORS forum.
There are currently 1 users logged in.
The most users ever online was 15 on 15-Jan-2009 at 16:36.
There are currently 0 guests browsing this forum, which makes a total of 1 users using this forum.
You have posted 0 messages to this forum. 0 overall.

FuseTalk Standard Edition v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.