![]() |
Telelogic DOORS (steve huntington) | ![]() |
new topic :
profile :
search :
help :
dashboard :
calendar :
home
|
||
Latest News:
|
|
Topic Title: Help with regular expression function Topic Summary: Created On: 8-Oct-2004 16:55 Status: Post and Reply |
Linear : Threading : Single : Branch |
![]() |
![]()
|
![]() |
|
I had a function "fGetOffset(InString, SubString)" that searched for SubString in every valid position of InString: for (i=0; i<length(InString)-lenght(SubString; i++); if it matches then found it.
I decided to expand my horizons and use regular expressions. Testing resulted in the new version (attached) about 4 times faster forshort strings, and 40 times faster for strings 300+ characters. It seemed to work. But the attached demo fails in the 4th case. I suspect the regular expresssion fails because there is a "|" character in the SubString. I don't want to insert a "Regexp re = regexp(SubString)" in the function since I've shown that increases time quite a bit. Anybody debug/explain this for me? Can I write a generic quick function like this? - Louie |
|
![]() |
|
![]() |
|
The or symboll '|' is part of match syntax and is being used in that context even though you pass it as a string.
|
|
![]() |
|
![]() |
|
Does that mean there's no way to do this, other than my original way of comparing the SubString to each position in the original string? Anyway of saying that SubString is to be taken litterally?
Is there some sort of "Is this a regular expression-like" function? The "findRichText()" command has a similar problem, where only the non-rich-text codes in the search string are searched (the Raw Text), ignoring the Rich-Text codes. The "contains" function for buffers has restrictions making it unusable. Maybe I should try putting the string in a Buffer and use the "contains Buffer Char" function, looking for each character in the SubString. - Louie |
|
![]() |
|
![]() |
|
Depending upon what you want to do:
Try \\| in one or both of your strings. Note that is two backslash with or |
|
![]() |
|
![]() |
|
Escaping stuff is fun.
![]() ------------------------- Paul dot Tiplady at TRW dot com TRW Automotive |
|
![]() |
|
![]() |
|
Looks like all that defeats the purpose, which is to do this fast and quick without memory leak. Even so, I don't think your function solves the problem where a SubString of "|a" is interpreted to mean "any character or an 'a'", since the "|" character isn't turned into a "\\|" sequence.
I've tried a new technique: put the string in a buffer and then use the "contains(Buffer, FirstCharOfSubString)" to find if the first character exists; then byte-for-byte insure the 2nd-plus characters all match up. Will find out if that speeds things up or not. - Louie |
|
![]() |
|
![]() |
|
try this function...
string convert_for_re(string sIn) { string sReturn = sIn string sTemp = sIn string sAccum = "" string sResidual int iLen Regexp rNuisanceCharacters = regexp "[\\||\\)|\\]|\\[|\\(].*" while(rNuisanceCharacters sTemp) { sAccum = sAccum sTemp[0 ![]() iLen = length(sTemp) sResidual = sTemp[(start 0) + 1:iLen-1] sTemp = sResidual sReturn = sAccum sResidual } return sReturn } and replace... int Offset = fGetOffset(InString, SubString) ...with... string sConv = convert_for_re(SubString) int Offset = fGetOffset(InString, sConv) you may want to use the lower() function for case independence as well. still more efficient than using a Buffer, I'll bet. |
|
![]() |
|
![]() |
|
I found that the "Regexp re= regexp string" function is EXTREMELY slow and should only be used in repeated searches. In your case, move that line outside the function and make it a global declaration.
My new function using "contains(Buff, char FirstOfSubString)" seems to be working as fast as my older attempt using regular expr: (if (SubString, InString) then match(0)...), but also works when SubString contains Regexp fields as well as RichText fields (the Achilies heal of function "findRichText"). Attach find the function. Not shown are a variety of overloaded functions that support string, matching case. The StartPosition parameter is so I can write a psuedo-overloaded "fGetOffsetLast" function, that finds the last occurance of SubString. I also found that creating and deleting buffers is pretty slow. So the overloaded ones make use of a private global buffer, used to store the value when the user calls fGetOffset with a Input string, which is routinely the case. Yes, non-case-matching involves turning both the Input and Substring to "lower" before calling the function. Its possible that function 'cistrcmp' might be a little faster. Anyway, this function is about 30 times faster than my old one when the InString is 100 characters and the SubString is at the end (or missing). Thanks for the help. Hard to believe DXL doesn't have such a parm already defined. - Louie |
|
![]() |
Telelogic DOORS
» DXL Exchange
»
Help with regular expression function
|
![]() |
FuseTalk Standard Edition v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.