Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB SQL » View Thread |
Messages 1 to 5 of 5 total |
CONTAINS filter condition with partial word match |
Thu, Feb 26 2009 1:07 PM | Permanent Link |
Peter Moser | Dear Tim,
select * from searchtable where searchcolumn contains 'searchword1* searchword2*' has erratic results: if "searchword1" is actually a complete word adding an asterisk to "searchword1" to perform a partial search still shows the same results of the Example: "house*" finds all words with "house" but not "housekeeper". But "housek*" finds all "housekeeper" "housekeeping" etc. This behavour only occurs when searchtable is reaching a certain size - which is why i cannot provide an example database so easily: table with 50000 records: search performs ok above 50000 records: search performs erratic Please find below two (slighty shortend) execution plans 1. with search term "hund* freu*" which results in 2 rows and 2. with search term "hunde* freu*" with 6 rows Which other information do you need to identiy the problem? Many thanks in advance! ================================================================================ SQL Query (Executed by ElevateDB 2.02 Build 8) ================================================================================ SELECT ALL "ubit_titelstamm"."titel_id" AS "titel_id", .... FROM "ubit_titelstamm" WHERE "st" CONTAINS 'hund* freu*' Source Tables ------------- ubit_titelstamm: 58770 rows Result Set ---------- The result set was insensitive and read-only The result set consisted of zero or more rows Filtering --------- The following filter condition was applied to the ubit_titelstamm table: "st" CONTAINS 'hund* freu*' [Index scan (ubit_titelstamm.Text_st):, 432 keys, 16384 bytes estimated cost] ================================================================================ 2 row(s) returned in 0,016 secs ================================================================================ ================================================================================ SQL Query (Executed by ElevateDB 2.02 Build 8) Note: The SQL shown here is generated by ElevateDB and may not be exactly the same as the SQL that was originally entered. However, none of the differences alter the execution results in any way. ================================================================================ SELECT ALL "ubit_titelstamm"."titel_id" AS "titel_id", FROM "ubit_titelstamm" WHERE "st" CONTAINS 'hunde* freu*' Source Tables ------------- ubit_titelstamm: 58770 rows Result Set ---------- The result set was insensitive and read-only The result set consisted of zero or more rows Filtering --------- The following filter condition was applied to the ubit_titelstamm table: "st" CONTAINS 'hunde* freu*' [Index scan (ubit_titelstamm.Text_st):, 450 keys, 16384 bytes estimated cost] ================================================================================ 6 row(s) returned in 0,015 secs ================================================================================ |
Fri, Feb 27 2009 8:48 AM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Peter,
<< This behavour only occurs when searchtable is reaching a certain size - which is why i cannot provide an example database so easily: table with 50000 records: search performs ok above 50000 records: search performs erratic Please find below two (slighty shortend) execution plans 1. with search term "hund* freu*" which results in 2 rows and 2. with search term "hunde* freu*" with 6 rows Which other information do you need to identiy the problem? >> Unfortunately, I'm going to probably need the table with more than 50,000 rows. I suspect that if I try what you're doing here with less rows, I'm not going to be able to reproduce it. -- Tim Young Elevate Software www.elevatesoft.com |
Fri, Feb 27 2009 10:59 AM | Permanent Link |
Peter Moser | Tim,
just send you a mail to support@elevatesoft.com with the location where and how to retrieve the test data. Thank you very much for your efforts! Peter Moser sys.team software GmbH "Tim Young [Elevate Software]" wrote: >>> Unfortunately, I'm going to probably need the table with more than 50,000 rows. I suspect that if I try what you're doing here with less rows, I'm not going to be able to reproduce it. -- Tim Young Elevate Software www.elevatesoft.com <<< |
Sat, Feb 28 2009 4:56 AM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Peter,
<< just send you a mail to support@elevatesoft.com with the location where and how to retrieve the test data. >> Thanks. The reason that the results change when you add more rows is that you're adding rows with st column values that include the word: Hündchen You can see what the text index looks like with DEU_CI as the collation in this index dump: Hun Row Count: 2 First Row ID: 20275 First Row/Page Pointer: 20275 Huncke Row Count: 1 First Row ID: 52036 First Row/Page Pointer: 52036 Hund Row Count: 75 First Row ID: 549 First Row/Page Pointer: 549 Hündchen Row Count: 1 First Row ID: 56717 First Row/Page Pointer: 56717 Hunde Row Count: 2 First Row ID: 232 First Row/Page Pointer: 232 As you can see, the ü is sorted after the u in terms of the collation order. The problem is that a partial-length search on Hund* causes the search to stop when it reaches the Hündchen value because Hünd <> Hund, thus resulting in less rows selected than you would expect. There are two solutions to this: 1) Use the DEP (German Phone Book order) collation instead: ALTER TEXT INDEX "Text_st" ON "ubit_titelstamm" ("st" COLLATE "DEP_CI") INDEXED WORD LENGTH 20 WORD GENERATOR "Default" and use this SQL instead: SELECT st FROM "ubit_titelstamm" WHERE "st" COLLATE DEP_CI CONTAINS 'hund* freu*' You can see the difference with an index dump when the collation is set to DEP_CI: Hund Row Count: 75 First Row ID: 549 First Row/Page Pointer: 549 Hunde Row Count: 39 First Row ID: 232 First Row/Page Pointer: 232 Hundealltag Row Count: 1 First Row ID: 20694 First Row/Page Pointer: 20694 Hundeausbildung Row Count: 1 First Row ID: 50934 First Row/Page Pointer: 50934 Hundebabys Row Count: 2 First Row ID: 24612 First Row/Page Pointer: 24612 Hundebuch Row Count: 2 First Row ID: 5750 First Row/Page Pointer: 5750 As you can see, umlauts are effectively ignored for collation purposes. 2) Use the accent-insensitive flag on the DEU collation, like this: ALTER TEXT INDEX "Text_st" ON "ubit_titelstamm" ("st" COLLATE "DEU_CI_AI") INDEXED WORD LENGTH 20 WORD GENERATOR "Default" and use this SQL instead: SELECT st FROM "ubit_titelstamm" WHERE "st" COLLATE DEU_CI_AI CONTAINS 'hund* freu*' -- Tim Young Elevate Software www.elevatesoft.com |
Mon, Mar 2 2009 2:37 AM | Permanent Link |
Peter Moser | Tim,
perfect! Thank you once again for the outstanding support! Peter Moser sys.team software GmbH "Tim Young [Elevate Software]" wrote: Peter, << just send you a mail to support@elevatesoft.com with the location where and how to retrieve the test data. >> Thanks. The reason that the results change when you add more rows is that you're adding rows with st column values that include the word: Hündchen You can see what the text index looks like with DEU_CI as the collation in this index dump: Hun Row Count: 2 First Row ID: 20275 First Row/Page Pointer: 20275 Huncke Row Count: 1 First Row ID: 52036 First Row/Page Pointer: 52036 Hund Row Count: 75 First Row ID: 549 First Row/Page Pointer: 549 Hündchen Row Count: 1 First Row ID: 56717 First Row/Page Pointer: 56717 Hunde Row Count: 2 First Row ID: 232 First Row/Page Pointer: 232 As you can see, the ü is sorted after the u in terms of the collation order. The problem is that a partial-length search on Hund* causes the search to stop when it reaches the Hündchen value because Hünd <> Hund, thus resulting in less rows selected than you would expect. There are two solutions to this: 1) Use the DEP (German Phone Book order) collation instead: ALTER TEXT INDEX "Text_st" ON "ubit_titelstamm" ("st" COLLATE "DEP_CI") INDEXED WORD LENGTH 20 WORD GENERATOR "Default" and use this SQL instead: SELECT st FROM "ubit_titelstamm" WHERE "st" COLLATE DEP_CI CONTAINS 'hund* freu*' You can see the difference with an index dump when the collation is set to DEP_CI: Hund Row Count: 75 First Row ID: 549 First Row/Page Pointer: 549 Hunde Row Count: 39 First Row ID: 232 First Row/Page Pointer: 232 Hundealltag Row Count: 1 First Row ID: 20694 First Row/Page Pointer: 20694 Hundeausbildung Row Count: 1 First Row ID: 50934 First Row/Page Pointer: 50934 Hundebabys Row Count: 2 First Row ID: 24612 First Row/Page Pointer: 24612 Hundebuch Row Count: 2 First Row ID: 5750 First Row/Page Pointer: 5750 As you can see, umlauts are effectively ignored for collation purposes. 2) Use the accent-insensitive flag on the DEU collation, like this: ALTER TEXT INDEX "Text_st" ON "ubit_titelstamm" ("st" COLLATE "DEU_CI_AI") INDEXED WORD LENGTH 20 WORD GENERATOR "Default" and use this SQL instead: SELECT st FROM "ubit_titelstamm" WHERE "st" COLLATE DEU_CI_AI CONTAINS 'hund* freu*' -- Tim Young Elevate Software www.elevatesoft.com |
This web page was last updated on Tuesday, September 17, 2024 at 04:19 AM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |