Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 11 to 20 of 28 total

Possible bug using fully qualified references

Tue, May 8 2007 7:48 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << Hmm. My take was that TEXTSEARCH was for finding words in a "text" column indexing simply made it faster. I don't know about under the hood but for me it was a lot better than writing a wadge of LIKE '%xxx%' if you wanted more than one word. >> Sure, but the point is that it is virtually unusable on large databases unless the columns are indexed. -- Tim Young Elevate Software www.elevatesoft.com
Wed, May 9 2007 3:01 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim ><< Hmm. My take was that TEXTSEARCH was for finding words in a "text" column >indexing simply made it faster. I don't know about under the hood but for me >it was a lot better than writing a wadge of LIKE '%xxx%' if you wanted more >than one word. >> > >Sure, but the point is that it is virtually unusable on large databases >unless the columns are indexed. Let battle commence! So on that basis you're going to ban LIKE are you? I agree that it would be very slow if used by itself on any decent sized database BUT 1) if that's the only way to get the data its the only way to get the data wether you use CONTAINS or LIKE and 2) often it will be used in conjunction with a test based on an index and hence the number of rows to be scanned will be less (hopefully a lot less) than the total in the database. In either case its a lot easier to write (or generate) field CONTAINS 'one two three' than (field COLLATE ANSI_CI LIKE '%one%' OR field COLLATE ANSI_CI LIKE '%two%' OR field COLLATE ANSI_CI LIKE '%three%') which could generate the wrong results unless you extend it to include all word break characters ('",.!? etc) Roy Lambert
Wed, May 9 2007 5:43 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << Let battle commence! So on that basis you're going to ban LIKE are you? >> Absolutely not. But LIKE isn't necessarily intended to be indexed. << I agree that it would be very slow if used by itself on any decent sized database BUT 1) if that's the only way to get the data its the only way to get the data wether you use CONTAINS or LIKE and 2) often it will be used in conjunction with a test based on an index and hence the number of rows to be scanned will be less (hopefully a lot less) than the total in the database. In either case its a lot easier to write (or generate) field CONTAINS 'one two three' than (field COLLATE ANSI_CI LIKE '%one%' OR field COLLATE ANSI_CI LIKE '%two%' OR field COLLATE ANSI_CI LIKE '%three%') which could generate the wrong results unless you extend it to include all word break characters ('",.!? etc) >> Sure, but you act like we're asking you to jump through hoops here. If you want to use CONTAINS, then all you need to do is make sure that you use CREATE TEXT INDEX to create a text index for the column that you want to search on. -- Tim Young Elevate Software www.elevatesoft.com
Thu, May 10 2007 5:11 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim ><< Let battle commence! So on that basis you're going to ban LIKE are you? > >> > >Absolutely not. But LIKE isn't necessarily intended to be indexed. How many other SQL selection keywords MANDATE an index? >Sure, but you act like we're asking you to jump through hoops here. If you >want to use CONTAINS, then all you need to do is make sure that you use >CREATE TEXT INDEX to create a text index for the column that you want to >search on. Whilst I've seen that the full text indexing speed in ElevateDB is far greater than in DBISAM it still takes time. It adds additional data into the index file making it larger. Naturally, if its anticipated that a column might be subject to a CONTAINS on a frequent basis I would index it (if only because I get bored waiting for the result) on the other hand if the word search is only going to be an occasional requirement, but is non-the-less valuable in reducing the amount of data to be eyeballed or printed then I wouldn't. Taking the very app I'm using to create this email. I have text indices on _Subject;_Message;_Comments;_Flags for speed but I don't have them on _Headers, _WhoTo, _WhoFrom but occasionally I need to add these into a query using TEXTSEARCH. The index file is already 150Mb+, with no words <3 or >40 characters, all html tags stripped and a decent list of stop words (upgrading to ElevateDB and using the base FTI stuff increased the index to 250Mb+). I haven't done any real speed tests yet in ElevateDB but I just tried deleting a few rows - I don't know how much of the time was the FTI (which is the big problem with DBISAM) and how much general ================================================================================ SQL Delete (Executed by ElevateDB 1.02 Build 1) Note: The SQL shown here is generated by ElevateDB and may not be exactly the same as the SQL that was originally entered. However, none of the differences alter the execution results in any way. ================================================================================ DELETE FROM "mandn" WHERE "_boxno" = 17 Target Table ------------ mandn: 149029 rows Filtering --------- The following filter condition was applied to the mandn table: "_boxno" = 17 [Index scan: 76 keys, 4096 bytes estimated cost] ================================================================================ 76 row(s) deleted in 29.406 secs ================================================================================
Thu, May 10 2007 1:56 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << How many other SQL selection keywords MANDATE an index? >> Well, considering that it is a text index operator, I think we're okay. << Whilst I've seen that the full text indexing speed in ElevateDB is far greater than in DBISAM it still takes time. It adds additional data into the index file making it larger. Naturally, if its anticipated that a column might be subject to a CONTAINS on a frequent basis I would index it (if only because I get bored waiting for the result) on the other hand if the word search is only going to be an occasional requirement, but is non-the-less valuable in reducing the amount of data to be eyeballed or printed then I wouldn't. Taking the very app I'm using to create this email. I have text indices on _Subject;_Message;_Comments;_Flags for speed but I don't have them on _Headers, _WhoTo, _WhoFrom but occasionally I need to add these into a query using TEXTSEARCH. >> Personally, I don't see much value to doing word searches on email user names or the email headers. Word searches on most useful on actual written text with sentence structure. Any other use is probably mis-using the text indexing in a way that it isn't intended to be used. << The index file is already 150Mb+, with no words <3 or >40 characters, all html tags stripped and a decent list of stop words (upgrading to ElevateDB and using the base FTI stuff increased the index to 250Mb+). >> What did you use for the word size for the text indexes ? The smaller the better with EDB. Also, don't get too worked up about the index size. One of the big issues with speed with DBISAM was the way the index compression worked, and EDB doesn't have the same issue. In fact, you'll consistently notice that, even with larger indexes, EDB will out-perform DBISAM in most cases. << I haven't done any real speed tests yet in ElevateDB but I just tried deleting a few rows - I don't know how much of the time was the FTI (which is the big problem with DBISAM) and how much general >> If it took 29 seconds to delete 76 rows, then you've got a lot of data indexed for those 76 rows. How big are the BLOB columns being indexed for those rows ? -- Tim Young Elevate Software www.elevatesoft.com
Thu, May 10 2007 2:34 PM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim ><< How many other SQL selection keywords MANDATE an index? >> > >Well, considering that it is a text index operator, I think we're okay. You may just notice that I'm DISAGREEING! still >Personally, I don't see much value to doing word searches on email user >names or the email headers. Word searches on most useful on actual written >text with sentence structure. Any other use is probably mis-using the text >indexing in a way that it isn't intended to be used. Whilst I sort of agree with you I don't think you can control, or should try to control the "appropriate" use of operators, just make sure they operate correctly (regardless of the existence of an index). If its an architectural issue I'll shut up. Otherwise I'll keep on trying to persuade you because I find it very useful. Its precisely because I don't see much use for it that I don't want to index it, but on some occasions it is useful eg if I know you've answered a post of mine sometime in the last couple of months, but say the only word I could remember was "index" I can use that to filter the messages, add in the date range and finally add my name from the sender column. Please don't reply that my name is in the message ><< The index file is already 150Mb+, with no words <3 or >40 characters, all >html tags stripped and a decent list of stop words (upgrading to ElevateDB >and using the base FTI stuff increased the index to 250Mb+). >> > >What did you use for the word size for the text indexes ? The smaller the >better with EDB. Just the defaults. >Also, don't get too worked up about the index size. One >of the big issues with speed with DBISAM was the way the index compression >worked, and EDB doesn't have the same issue. In fact, you'll consistently >notice that, even with larger indexes, EDB will out-perform DBISAM in most >cases. I'm not overly bothered about the size (I will be implementing my custom filter) but rather just illustrating the size of the problem, and I most definitely have noticed that EDB is faster than DBISAM, but I am concerned about speed. ><< I haven't done any real speed tests yet in ElevateDB but I just tried >deleting a few rows - I don't know how much of the time was the FTI (which >is the big problem with DBISAM) and how much general >> > >If it took 29 seconds to delete 76 rows, then you've got a lot of data >indexed for those 76 rows. How big are the BLOB columns being indexed for >those rows ? The .blb is about 1.4Gb in total the indexed columns are probably 70% - 90% of that. ps I have deliberately not added my name
Fri, May 11 2007 2:50 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, <>< Whilst I sort of agree with you I don't think you can control, or should try to control the "appropriate" use of operators, just make sure they operate correctly (regardless of the existence of an index). If its an architectural issue I'll shut up. Otherwise I'll keep on trying to persuade you because I find it very useful. >> I told you it was an architectural issue when you first brought it up: http://www.elevatesoft.com/scripts/newsgrp.dll?action=openmsg&group=16&msg=856&page=1#msg856 << Its precisely because I don't see much use for it that I don't want to index it, but on some occasions it is useful eg if I know you've answered a post of mine sometime in the last couple of months, but say the only word I could remember was "index" I can use that to filter the messages, add in the date range and finally add my name from the sender column. Please don't reply that my name is in the message >> I understand. But you can still do all of the same by combining the various methods - CONTAINS for word searches, and LIKE, etc. for wildcard searches on the names and headers. << Just the defaults. >> Bumping the word size down to 20 instead of 30 characters should free up some space. << I'm not overly bothered about the size (I will be implementing my custom filter) but rather just illustrating the size of the problem, and I most definitely have noticed that EDB is faster than DBISAM, but I am concerned about speed. >> I understand, but size isn't as much of a factor anymore (that's what he said in terms of speed. << The .blb is about 1.4Gb in total the indexed columns are probably 70% - 90% of that. >> The columns for the 76 rows are 70-90% of that ? If so, then that explains the time required. << ps I have deliberately not added my name >> And now you're completely invisible. -- Tim Young Elevate Software www.elevatesoft.com
Sat, May 12 2007 4:15 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim >I told you it was an architectural issue when you first brought it up: I'll have to take your word for that cos having re-read the post I still don't understand it. But I'll keep my word and shut up. <<I understand. But you can still do all of the same by combining the various methods - CONTAINS for word searches, and LIKE, etc. for wildcard searches on the names and headers.>> As I said LIKE isn't a direct comparison for CONTAINS eg CONTAINS 'fred bert joe' would have to become LIKE '%fred%' AND LIKE '%bert%' AND LIKE '%joe' which would return true for 'frederica bert joe' which is wrong. It means I'll need to write my own function to do it. Any hints or suggestions (especially code) gratefully received. Roy Lambert
Sat, May 12 2007 8:26 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim ><< The .blb is about 1.4Gb in total the indexed columns are probably 70% - >90% of that. >> > >The columns for the 76 rows are 70-90% of that ? If so, then that explains >the time required. Once I've got 1.03 I'll run some speed tests with and without text indices, and with and without the additional 3 columns indexed. My gut feel is that its better than DBISAM but I can't check that without re-indexing since I have my custom filter which would influence timings. Roy Lambert
Mon, May 14 2007 5:36 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << I'll have to take your word for that cos having re-read the post I still don't understand it. But I'll keep my word and shut up. >> I'm not asking you to shut up, just reiterating my original point. << As I said LIKE isn't a direct comparison for CONTAINS eg CONTAINS 'fred bert joe' would have to become LIKE '%fred%' AND LIKE '%bert%' AND LIKE '%joe' which would return true for 'frederica bert joe' which is wrong. >> << It means I'll need to write my own function to do it. Any hints or suggestions (especially code) gratefully received. >> An external function would work just fine, and you can use the same filtering, etc. code that you use for the text filter.

« Previous Page	Page 2 of 3	Next Page »
Jump to Page: 1 2 3