Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 28 total
Thread Possible bug using fully qualified references
Tue, May 8 2007 7:48 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< Hmm. My take was that TEXTSEARCH was for finding words in a "text" column
indexing simply made it faster. I don't know about under the hood but for me
it was a lot better than writing a wadge of LIKE '%xxx%' if you wanted more
than one word. >>

Sure, but the point is that it is virtually unusable on large databases
unless the columns are indexed.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, May 9 2007 3:01 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


><< Hmm. My take was that TEXTSEARCH was for finding words in a "text" column
>indexing simply made it faster. I don't know about under the hood but for me
>it was a lot better than writing a wadge of LIKE '%xxx%' if you wanted more
>than one word. >>
>
>Sure, but the point is that it is virtually unusable on large databases
>unless the columns are indexed.

Let battle commence! So on that basis you're going to ban LIKE are you?

I agree that it would be very slow if used by itself on any decent sized database BUT 1) if that's the only way to get the data its the only way to get the data wether you use CONTAINS or LIKE and 2) often it will be used in conjunction with a test based on an index and hence the number of rows to be scanned will be less (hopefully a lot less) than the total in the database.

In either case its a lot easier to write (or generate)

field CONTAINS 'one two three'

than

(field COLLATE ANSI_CI LIKE '%one%' OR field COLLATE ANSI_CI LIKE '%two%' OR field COLLATE ANSI_CI LIKE '%three%')

which could generate the wrong results unless you extend it to include all word break characters ('",.!? etc)

Roy Lambert
Wed, May 9 2007 5:43 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< Let battle commence! So on that basis you're going to ban LIKE are you?
>>

Absolutely not.  But LIKE isn't necessarily intended to be indexed.

<< I agree that it would be very slow if used by itself on any decent sized
database BUT 1) if that's the only way to get the data its the only way to
get the data wether you use CONTAINS or LIKE and 2) often it will be used in
conjunction with a test based on an index and hence the number of rows to be
scanned will be less (hopefully a lot less) than the total in the database.

In either case its a lot easier to write (or generate)

field CONTAINS 'one two three'

than

(field COLLATE ANSI_CI LIKE '%one%' OR field COLLATE ANSI_CI LIKE '%two%'
OR field COLLATE ANSI_CI LIKE '%three%')

which could generate the wrong results unless you extend it to include all
word break characters ('",.!? etc) >>

Sure, but you act like we're asking you to jump through hoops here.  If you
want to use CONTAINS, then all you need to do is make sure that you use
CREATE TEXT INDEX to create a text index for the column that you want to
search on.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, May 10 2007 5:11 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

><< Let battle commence! So on that basis you're going to ban LIKE are you?
> >>
>
>Absolutely not. But LIKE isn't necessarily intended to be indexed.

How many other SQL selection keywords MANDATE an index?

>Sure, but you act like we're asking you to jump through hoops here. If you
>want to use CONTAINS, then all you need to do is make sure that you use
>CREATE TEXT INDEX to create a text index for the column that you want to
>search on.

Whilst I've seen that the full text indexing speed in ElevateDB is far greater than in DBISAM it still takes time. It adds additional data into the index file making it larger. Naturally, if its anticipated that a column might be subject to a CONTAINS on a frequent basis I would index it (if only because I get bored waiting for the result) on the other hand if the word search is only going to be an occasional requirement, but is non-the-less valuable in reducing the amount of data to be eyeballed or printed then I wouldn't.

Taking the very app I'm using to create this email. I have text indices on _Subject;_Message;_Comments;_Flags for speed but I don't have them on _Headers, _WhoTo, _WhoFrom but occasionally I need to add these into a query using TEXTSEARCH. The index file is already 150Mb+, with no words <3 or >40 characters, all html tags stripped and a decent list of stop words (upgrading to ElevateDB and using the base FTI stuff increased the index to 250Mb+).

I haven't done any real speed tests yet in ElevateDB but I just tried deleting a few rows - I don't know how much of the time was the FTI (which is the big problem with DBISAM) and how much general

================================================================================
SQL Delete (Executed by ElevateDB 1.02 Build 1)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.

================================================================================


DELETE FROM "mandn"
WHERE "_boxno" = 17

Target Table
------------

mandn: 149029 rows

Filtering
---------

The following filter condition was applied to the mandn table:

"_boxno" = 17 [Index scan: 76 keys, 4096 bytes estimated cost]

================================================================================
76 row(s) deleted in 29.406 secs
================================================================================
Thu, May 10 2007 1:56 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< How many other SQL selection keywords MANDATE an index? >>

Well, considering that it is a text index operator, I think we're okay. Smiley

<< Whilst I've seen that the full text indexing speed in ElevateDB is far
greater than in DBISAM it still takes time. It adds additional data into the
index file making it larger. Naturally, if its anticipated that a column
might be subject to a CONTAINS on a frequent basis I would index it (if only
because I get bored waiting for the result) on the other hand if the word
search is only going to be an occasional requirement, but is non-the-less
valuable in reducing the amount of data to be eyeballed or printed then I
wouldn't.

Taking the very app I'm using to create this email. I have text indices on
_Subject;_Message;_Comments;_Flags for speed but I don't have them on
_Headers, _WhoTo, _WhoFrom but occasionally I need to add these into a query
using TEXTSEARCH. >>

Personally, I don't see much value to doing word searches on email user
names or the email headers.  Word searches on most useful on actual written
text with sentence structure.  Any other use is probably mis-using the text
indexing in a way that it isn't intended to be used.

<< The index file is already 150Mb+, with no words <3 or >40 characters, all
html tags stripped and a decent list of stop words (upgrading to ElevateDB
and using the base FTI stuff increased the index to 250Mb+). >>

What did you use for the word size for the text indexes ?  The smaller the
better with EDB.  Also, don't get too worked up about the index size.  One
of the big issues with speed with DBISAM was the way the index compression
worked, and EDB doesn't have the same issue.  In fact, you'll consistently
notice that, even with larger indexes, EDB will out-perform DBISAM in most
cases.

<< I haven't done any real speed tests yet in ElevateDB but I just tried
deleting a few rows - I don't know how much of the time was the FTI (which
is the big problem with DBISAM) and how much general >>

If it took 29 seconds to delete 76 rows, then you've got *a lot* of data
indexed for those 76 rows.  How big are the BLOB columns being indexed for
those rows ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, May 10 2007 2:34 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

><< How many other SQL selection keywords MANDATE an index? >>
>
>Well, considering that it is a text index operator, I think we're okay. Smiley

You may just notice that I'm DISAGREEING! still Smiley

>Personally, I don't see much value to doing word searches on email user
>names or the email headers. Word searches on most useful on actual written
>text with sentence structure. Any other use is probably mis-using the text
>indexing in a way that it isn't intended to be used.

Whilst I sort of agree with you I don't think you can control, or should try to control the "appropriate" use of operators, just make sure they operate correctly (regardless of the existence of an index). If its an architectural issue I'll shut up. Otherwise I'll keep on trying to persuade you because I find it very useful.

Its precisely because I don't see much use for it that I don't want to index it, but on some occasions it is useful eg if I know you've answered a post of mine sometime in the last couple of months, but say the only word I could remember was "index" I can use that to filter the messages, add in the date range and finally add my name from the sender column. Please don't reply that my name is in the message Smiley

><< The index file is already 150Mb+, with no words <3 or >40 characters, all
>html tags stripped and a decent list of stop words (upgrading to ElevateDB
>and using the base FTI stuff increased the index to 250Mb+). >>
>
>What did you use for the word size for the text indexes ? The smaller the
>better with EDB.

Just the defaults.

>Also, don't get too worked up about the index size. One
>of the big issues with speed with DBISAM was the way the index compression
>worked, and EDB doesn't have the same issue. In fact, you'll consistently
>notice that, even with larger indexes, EDB will out-perform DBISAM in most
>cases.

I'm not overly bothered about the size (I will be implementing my custom filter) but rather just illustrating the size of the problem, and I most definitely have noticed that EDB is faster than DBISAM, but I am concerned about speed.

><< I haven't done any real speed tests yet in ElevateDB but I just tried
>deleting a few rows - I don't know how much of the time was the FTI (which
>is the big problem with DBISAM) and how much general >>
>
>If it took 29 seconds to delete 76 rows, then you've got *a lot* of data
>indexed for those 76 rows. How big are the BLOB columns being indexed for
>those rows ?

The .blb is about 1.4Gb in total the indexed columns are probably 70% - 90% of that.



ps I have deliberately not added my name
Fri, May 11 2007 2:50 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<>< Whilst I sort of agree with you I don't think you can control, or should
try to control the "appropriate" use of operators, just make sure they
operate correctly (regardless of the existence of an index). If its an
architectural issue I'll shut up. Otherwise I'll keep on trying to persuade
you because I find it very useful. >>

I told you it was an architectural issue when you first brought it up:

http://www.elevatesoft.com/scripts/newsgrp.dll?action=openmsg&group=16&msg=856&page=1#msg856

<< Its precisely because I don't see much use for it that I don't want to
index it, but on some occasions it is useful eg if I know you've answered a
post of mine sometime in the last couple of months, but say the only word I
could remember was "index" I can use that to filter the messages, add in the
date range and finally add my name from the sender column. Please don't
reply that my name is in the message Smiley>>

I understand.  But you can still do all of the same by combining the various
methods - CONTAINS for word searches, and LIKE, etc. for wildcard searches
on the names and headers.

<< Just the defaults. >>

Bumping the word size down to 20 instead of 30 characters should free up
some space.

<< I'm not overly bothered about the size (I will be implementing my custom
filter) but rather just illustrating the size of the problem, and I most
definitely have noticed that EDB is faster than DBISAM, but I am concerned
about speed. >>

I understand, but size isn't as much of a factor anymore (that's what *he*
said Smiley in terms of speed.

<< The .blb is about 1.4Gb in total the indexed columns are probably 70% -
90% of that. >>

The columns for the 76 rows are 70-90% of that ?  If so, then that explains
the time required.

<< ps I have deliberately not added my name >>

And now you're completely invisible. Smiley

--
Tim Young
Elevate Software
www.elevatesoft.com

Sat, May 12 2007 4:15 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


>I told you it was an architectural issue when you first brought it up:

I'll have to take your word for that cos having re-read the post I still don't understand it. But I'll keep my word and shut up.

<<I understand.  But you can still do all of the same by combining the various
methods - CONTAINS for word searches, and LIKE, etc. for wildcard searches
on the names and headers.>>

As I said LIKE isn't a direct comparison for CONTAINS eg

CONTAINS 'fred bert joe'

would have to become

LIKE '%fred%' AND LIKE '%bert%' AND LIKE '%joe'

which would return true for 'frederica bert joe' which is wrong.

It means I'll need to write my own function to do it. Any hints or suggestions (especially code) gratefully received.

Roy Lambert
Sat, May 12 2007 8:26 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


><< The .blb is about 1.4Gb in total the indexed columns are probably 70% -
>90% of that. >>
>
>The columns for the 76 rows are 70-90% of that ? If so, then that explains
>the time required.

Once I've got 1.03 I'll run some speed tests with and without text indices, and with and without the additional 3 columns indexed. My gut feel is that its better than DBISAM but I can't check that without re-indexing since I have my custom filter which would influence timings.

Roy Lambert

Mon, May 14 2007 5:36 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< I'll have to take your word for that cos having re-read the post I still
don't understand it. But I'll keep my word and shut up. >>

I'm not asking you to shut up, just reiterating my original point. Smiley

<< As I said LIKE isn't a direct comparison for CONTAINS eg

CONTAINS 'fred bert joe'

would have to become

LIKE '%fred%' AND LIKE '%bert%' AND LIKE '%joe'

which would return true for 'frederica bert joe' which is wrong. >>

<< It means I'll need to write my own function to do it. Any hints or
suggestions (especially code) gratefully received. >>

An external function would work just fine, and you can use the same
filtering, etc. code that you use for the text filter.

« Previous PagePage 2 of 3Next Page »
Jump to Page:  1 2 3
Image