Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » DBISAM Technical Support » Support Forums » DBISAM General » View Thread |
Messages 1 to 4 of 4 total |
Speed of Full Text Indexing |
Wed, Mar 14 2007 10:27 AM | Permanent Link |
Igor Colovic | I have created test application using Trial DBISam database.
My main interest is in Full Text Index. Using Engine.BuildWordList function I have noticed that it it slow. I do not know why (I do not have source). I have function that splits 15MB text file in to words in less than 1 second (excluding loading). I can post it here. Another thing is encryption. Why is it so slow. If I do not encrypt table, inserting data (data is ~14000 documents ~1GB of data) is fast ~1-2 min. With encryption ~15 min. Searching Newsgroups I have found that I can change encryption (use my own). I have tried this, but there is no speed gain. I am not expert in encryption, but why is engine serving me with only 8 bytes of data to encrypt. Would it not be more efficient to encrypt larger blocks of data. Data engine is sending whole blobs for Compression/Decompression. Why not serve the whole blob for encryption. Using blowfish encryption file of ~15 MB size is encrypted in 1-2 sec. |
Wed, Mar 14 2007 6:38 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Igor,
<< I have created test application using Trial DBISam database. My main interest is in Full Text Index. Using Engine.BuildWordList function I have noticed that it it slow. I do not know why (I do not have source). I have function that splits 15MB text file in to words in less than 1 second (excluding loading). I can post it here. >> DBISAM has to perform a lot of callbacks for text index filtering during the BuildWordList function and it also has to remove duplicate words. << Another thing is encryption. Why is it so slow. If I do not encrypt table, inserting data(data is ~14000 documents ~1GB of data) is fast ~1-2 min. With encryption ~15 min. >> The encryption doesn't occur on the entire file - each record, index page, or BLOB block needs to be encrypted separately so that they can be read separately. This can be expensive when a lot of I/O is occurring. << Searching Newsgroups I have found that I can change encryption (use my own). I have tried this, but there is no speed gain. I am not expert in encryption, but why is engine serving me with only 8 bytes of data to encrypt. Would it not be more efficient to encrypt larger blocks of data. >> DBISAM is designed around Blowfish, which encrypts data in 8-byte blocks. << Data engine is sending whole blobs for Compression/Decompression. Why not serve the whole blob for encryption. Using blowfish encryption file of ~15 MB size is encrypted in 1-2 sec. >> Again, you're thinking in terms of "files", which is not how a database stores records, indexes, or BLOBs. They are divided up into blocks, and these blocks are what is encrypted/decrypted. -- Tim Young Elevate Software www.elevatesoft.com |
Thu, Mar 15 2007 5:28 AM | Permanent Link |
Igor Colovic | "Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote:
<<DBISAM has to perform a lot of callbacks for text index filtering during the BuildWordList function and it also has to remove duplicate words.>> Yes I known. The resulting WordList is sorted and there is no duplicate words. In new test I have used TDBISAMStringList, using LocaleID, Sort and dupIgnore. Is is still faster. Not much but any gain in speed in a database is a plus. I have repeated test using all (~1GB) of files. This files in the end are stored in database. BuildWordList: 14:09 My Test: 10:24 <<The encryption doesn't occur on the entire file - each record, index page, or BLOB block needs to be encrypted separately so that they can be read separately. This can be expensive when a lot of I/O is occurring.>> DBISAM is designed around Blowfish, which encrypts data in 8-byte blocks. I know that every field has to be encrypted. The thing is that I am moving one project from Apollo (xBase) and I am testing DBISAM as a replacement. Apollo has encryption and it is encrypting whole fields, Blobs ... Why can DBIsam do this. <<Again, you're thinking in terms of "files", which is not how a database stores records, indexes, or BLOBs. They are divided up into blocks, and these blocks are what is encrypted/decrypted.>> No. These files are stored in database (memo fields). Every time I want data from blob field I want the whole data, wright? So why not encrypt whole blob field? Maybe use 8-byte blocks for every other field but not MEMO/BLOB. MEMO/BLOBS can be encrypted as whole. This would speed thing up. What is the size of these blocks? Is DBISAM writing/reading data in 8-byte blocks? |
Fri, Mar 16 2007 3:42 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Igor,
<< Yes I known. The resulting WordList is sorted and there is no duplicate words. In new test I have used TDBISAMStringList, using LocaleID, Sort and dupIgnore. Is is still faster. Not much but any gain in speed in a database is a plus. I have repeated test using all (~1GB) of files. This files in the end are stored in database. BuildWordList: 14:09 My Test: 10:24 >> Again, BuildWordList has to perform callbacks to check for any user-defined text filtering functionality. The function call overhead is probably contributing some time to the process. << I know that every field has to be encrypted. >> Not every field - every record. << The thing is that I am moving one project from Apollo (xBase) and I am testing DBISAM as a replacement. Apollo has encryption and it is encrypting whole fields, Blobs ... Why can DBIsam do this. >> DBISAM does do that. Also, what kind of encryption is Apollo using ? << No. These files are stored in database (memo fields). Every time I want data from blob field I want the whole data, wright? So why not encrypt whole blob field? >> Because the encryption in DBISAM is handled at a lower level than that. It is handled at the buffer manager level for all records, index pages, and BLOB blocks. Besides, it doesn't really matter - the same amount of data will need to be encrypted/decrypted. It doesn't matter when it takes place. How big are the files that you're storing ? If they are normally very large, then you should consider increasing the BLOB block size for the table from 512 bytes to something larger like 2048 bytes. << Maybe use 8-byte blocks for every other field but not MEMO/BLOB. MEMO/BLOBS can be encrypted as whole. This would speed thing up. >> No, it wouldn't - see my comments above. -- Tim Young Elevate Software www.elevatesoft.com |
This web page was last updated on Tuesday, September 17, 2024 at 04:19 AM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |