![]() |
Telelogic DOORS (steve huntington) | ![]() |
new topic :
profile :
search :
help :
dashboard :
calendar :
home
|
||
Latest News:
|
|
Topic Title: Search Algorithm Topic Summary: Created On: 30-Oct-2008 15:42 Status: Post and Reply |
Linear : Threading : Single : Branch |
![]() |
![]()
|
![]() |
|
I imagine that everyone that has complicated DXL has written a search routine, finding the location of the search string in the in string or buffer. The easy way to do this is to turn the search string into a regular expression, then let it do the search. Something like this:
RegExp re = regexp(SearchString) if (re InBuff) return(match 0)) else return(-1) However, my tests years ago showed that the 'regexp' command was extremely time consuming, and running 1000s of searches was extremely slow. So I wrote an actual fGetOffset(Buffer bufIn, string SearchString, int StartLoc) as follows: [1] using 'contain' look for the first character of SearchString in the buffer, starting at StartLoc. [2] If found, look individually for the first 4 characters of SearchString in the corresponding buffer positions [3] If found, then extract that string portion of the buffer and compare that to the complete SearchString. It occured to me this week end (at the UGC), that perhaps I could use the RegExp method without resorting to 1000s of regexp commands. Basically storing all the RegExp used so far in a Skip, and searching the skip as follows: Skip g_skpGetOffsetREs = createString() // Global skip list, used only by fGetOffset() // KEY: 'string' SearchString; DATA: 'RegExp' corresponding to SearchString. int fGetOffset(Buffer buffIn, string SearchString, int StartLoc) { RegExp re if (!find(g_skpGetOffsetREs, SearchString, re)) re = regexp(SearchString) // First time searching for this, define re else{} // Have searched for this before, used stored re .... } I think this lets us have our cake and eat it too. What do you think? - Louie |
|
![]() |
|
![]() |
|
no good idea - think it over!
|
|
![]() |
|
![]() |
|
Hi Louie,
I also think, that is not an good idea to reuse the regulars ... It looks strange :D Attached, you will find a more efficient way of "contains". While working on strings, I've found a very strange behaviour: Extrating chars from string ([]) needs linear effort - that means it needs 10 times the time to get 10th char than the first one. So looping through large strings cahr by char might be very slow. Text buffers do not have this behaviour! Hope that will be intersting for you ![]() Greetings Reik ------------------------- Evosoft GmbH for Siemens Industry Sector Berlin, Germany |
|
![]() |
|
![]() |
|
Your algorithm fails when the searched string contains some prefix of the search string; try this: stringContains("DoDont", "Dont"); it should find it but it doesn't.
- Louie |
|
![]() |
|
![]() |
|
Oh, sorry, you are right ...
small correction solves this problem. Greetings Reik ------------------------- Evosoft GmbH for Siemens Industry Sector Berlin, Germany |
|
![]() |
|
![]() |
|
Still fails, try this:
print "[" (stringContains("DoDoDnt", "DoDnt")) "]\n" You need two loops, inner loop J for each value of I. - Louie |
|
![]() |
|
![]() |
|
Don't know if findPlainText function is faster or slower but it is lot easier than algorithms proposed here.
See demo //Demo /*Demo finds substring*/ int iOffset=-1 int iLength string sInput="DoDoDnt" string sFind ="DoDnt" bool matchCase=true findPlainText(sInput,sFind, iOffset, iLength, matchCase) if(iOffset>-1) print iOffset "" else print "Not Found" Edited: 4-Nov-2008 at 19:04 by ron lewis |
|
![]() |
|
![]() |
|
Yes, native functions should be prefered due to much better runtime.
Especially if the function will be called as often as "stringContains" or "stringFind". Just make your own by wrapping in case of error-catching or interface-simplification. In case of regExps you must know how to use them - then this is also an option for efficient stringContains-functionality. Always keep in mind how the interpreter is working. |
|
![]() |
|
![]() |
|
Hi Louie,
it seems that you are the best tester alive ![]() Another small chnage - hopefully you won't found an bad case again ![]() Greetings Reik ------------------------- Evosoft GmbH for Siemens Industry Sector Berlin, Germany |
|
![]() |
|
![]() |
|
Hi Ron, Hi Louie,
small test shows, that Ron's method seems to be the fastet one. It will be much faster than the other ones .... Unfortunately my method is the lamest one ![]() Greetings Reik ------------------------- Evosoft GmbH for Siemens Industry Sector Berlin, Germany |
|
![]() |
|
![]() |
|
Logical extension of this discussion is a replacement function that is fast -- attached is one I use.
|
|
![]() |
|
![]() |
|
When I started my DXL carreer I was tasked with changing the company accronym in all the requirement specs. I don't recall exactly the accronyms or issues, but discovered the hard way about finding and replacing strings that were partial sub-sets of each other. I ended up coming up with a now-dead function 'fReplaceStringSafe' that would replace all occurances of A with B, so long as the position of A didn't already contain B; perhaps it would replace "ITT" with "ITTN" so long as "ITTN" wasn't already there; thus preventing "ITTN" from becoming "ITTNN". Anyway, that experience was in the back of my mind reading your code.
Cannot find an example where your code fails, but [1] Modifying the main loop variable inside the loop give me major HeeBeeJeeBees. No, use nested loops where the inner J loop has a break statement. [2] Your second 'if' statement should be an 'elseif' statement. [3] you don't need to search the entire length of s1; only the first so-many characters: if both strings are the same length you need to search from only the first character since the sub-string cannot start in the 2nd character. Therefore you main loop should perhaps look like this: for (i=0; i<Len1-Len2 && j < Len2; i++). [4] you can drastically improve performance by using global but private Buffers for this function, creating them only once (in the main code) and never deleting them, just re-using them. That is, move the 'Buffer b1 = create()' statement outside the function. Since its global you'll need to give it a clever name; I would use 'gl_buf1StringContains'. Digressing, when I started coding PL1 using punch cards in college, I punched up 1000s of cards all with a semi-colon in the last column, since I hated having to put a semi-colon at the end of a line that was OBVIOUSLY the end of the statement. I see you put semi-colons, even though DXL is clever enough to know the end of the statement. Not sure why I said that. I'll do some testing to find the 'best' algorithm, but suspect that checking buffers as you have done will prove faster than checking strings with findPlainText. This is very important to my code since my library functions fGetOffset and fReplaceString are used 100s of 1000s of times for my big scripts. - Louie |
|
![]() |
|
![]() |
|
Here is an alternate that uses buffers -- don't know if it is faster.
Perhaps Reik may want to write a test fixture to test. |
|
![]() |
|
![]() |
|
Here are two more alternates that uses buffers -- don't know which of the three I presented in this forum is fasts.
Perhaps Reik may want to write a test fixture to test. Edited: 5-Nov-2008 at 16:36 by ron lewis |
|
![]() |
|
![]() |
|
Hi Ron, Hi Louie,
first an answer to Louie's question about "semicolon" - we are using Doxygen to generate Code documentation. Doxygen needs the semicolons to detect the end of line - otherwise the resuling documention is not as expected ![]() I testet a little bit more with the first three functions:
After implementing Louie's suggestion to use global text Buffer my function was much faster than before - I do not like to have global dynamic variables but here the performance is so much faster that I could make an exception ![]() Ron's replace functions I did not testet because the original post was about searching a string, not replacing it. With longer strings the runing times become another ratio: Louie's method becomes the fastest, but get problems with characters with special meaning in regular expression, e.g. .+* My method is the second And Ron's findPlainText is the slowest. If you want to try your own - the code is attached to this message... Greetings Reik ------------------------- Evosoft GmbH for Siemens Industry Sector Berlin, Germany Edited: 6-Nov-2008 at 07:25 by Reik Schroeder |
|
![]() |
|
![]() |
|
Reik, Either of the following two attached function appears to be faster if you want to test.
int FastestFind (Buffer b1, string s2) { if(matches(s2,tempStringOf(b1))) return start 0 return -1 } int FastFind (string s1, s2) { if(matches(s2,s1)) return start 0 return -1 } |
|
![]() |
Telelogic DOORS
» DXL Exchange
»
Search Algorithm
|
![]() |
FuseTalk Standard Edition v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.