Home > Archive > Clipper > September 2007 > Global search
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Hello
I'm looking for a sample who make a global search through a DBF file (all fields) and return one array with the rec no where the word was found.
Many do while are very slow! Are they a lib who have a fast function for that?
Many thanks in advance for any help.
Otto
| |
| Stephen Quinn 2007-09-19, 7:55 am |
| Otto
> I'm looking for a sample who make a global search through a DBF file (all fields)
> and return one array with the rec no where the word was found.
> Many do while are very slow! Are they a lib who have a fast function for that?
FastText Search (FTS)
SIXDRIVER V3 has FTS library built in.
Flexfile 2 & 3 have Query() functions for DBF and memo fields.
All are very fast - I've used them in my apps
FTS requires an index (per DBF you want to search) though which needs to be in RecNo order
ie INDEXORD(0) in a 1<->1 relationship
Small limitation on FTS is that indexing more than 4K of memo data/record isn't ideal for performance.
You should be able to get Flexfile through http://www.Grafxsoft.com not sure where you'd get Sixdriver V3 these days,
you might ask Brian if he has a copy if you email grafx.
--
CYA
Steve
| |
|
| On Wed, 19 Sep 2007 11:04:18 GMT, "Stephen Quinn" <stevejqNO@SPbigpond.AMnet.au> wrote:
>Otto
>
>
>FastText Search (FTS)
>SIXDRIVER V3 has FTS library built in.
>Flexfile 2 & 3 have Query() functions for DBF and memo fields.
>
>All are very fast - I've used them in my apps
>FTS requires an index (per DBF you want to search) though which needs to be in RecNo order
> ie INDEXORD(0) in a 1<->1 relationship
>
>Small limitation on FTS is that indexing more than 4K of memo data/record isn't ideal for performance.
>
>You should be able to get Flexfile through http://www.Grafxsoft.com not sure where you'd get Sixdriver V3 these days,
>you might ask Brian if he has a copy if you email grafx.
Hello Stephen,
Many thanks for your quick help. Unfortunately I don't have Flexfile or SIXDRIVER.
As lib's I'm using CA-Tools, Comix and PageScript.
Best regards
Otto
| |
|
| Dear otto:
On Sep 19, 3:34 am, otto <oha...@freesurf.ch> wrote:
> Hello
>
> I'm looking for a sample who make a global
> search through a DBF file (all fields) and
> return one array with the rec no where the
> word was found. Many do while are very
> slow! Are they a lib who have a fast
> function for that?
You may be able to bend WildS or WildSearch to your need:
http://www.the-oasis.net/ftpmaster....nt=ftpgenrl.htm
.... wilds .zip, wildsrch.zip
You could create a function that would do this.
Open the file with fopen,
note the number of records,
note the record length,
note the position of the first record,
skip to the first data record,
intialize record counter to 1,
(A) read the full record,
search for text,
if found, aadd the record number
skip to next record (if required)
increment counter
if not greater than number of records, repeat at (A)
David A. Smith
| |
|
| On Wed, 19 Sep 2007 09:55:02 -0700, dlzc <dlzc1@cox.net> wrote:
Dear Davit
>Dear otto:
>
>On Sep 19, 3:34 am, otto <oha...@freesurf.ch> wrote:
>
>You may be able to bend WildS or WildSearch to your need:
>http://www.the-oasis.net/ftpmaster....nt=ftpgenrl.htm
>... wilds .zip, wildsrch.zip
>
>You could create a function that would do this.
> Open the file with fopen,
> note the number of records,
> note the record length,
> note the position of the first record,
> skip to the first data record,
> intialize record counter to 1,
> (A) read the full record,
> search for text,
> if found, aadd the record number
> skip to next record (if required)
> increment counter
> if not greater than number of records, repeat at (A)
>
>David A. Smith
Many thanks for your help.
Need also Sixdrivers!
.....
#include "SIXCDX.CH
....
Regards
Otto
| |
| Johan Nel 2007-09-20, 3:55 am |
| Otto,
Seems you don't have the necessary libraries to use some of the
functions to do what you need.
Here is my solution trying to limit the number of loops/evaluations that
need to be done to a minimum without the need of some commercial
libraries, as well as a minimum overhead on Aadd() etc.
Just remember the function is limited to max array size 4096 if I
remember correctly from my Clipper days.
In your code:
....
SELECT (cAlias)
DbGoTop()
WHILE !Eof()
aSearchRec := StrInDbf('XYZ')
// Do with aSearchRec what you need to do
ENDDO
....
FUNCTION StrInDbf(cSearch)
LOCAL aRecs := Array(4096), aTxtFlds := {}, aStruc
LOCAL nFld, nFldCnt, nCount := 0
aStruc := DbStruct()
// Put TextField positions in an array not to have to evaluate
// non text fields in the loop
Aeval(aStruc, ;
{|aFld, nPos| iif(aFld[DBS_TYPE] $ 'CM', ;
Aadd(aTxtFlds, nPos), NIL)})
IF Len(aTxtFlds) > 0
nFldCnt := Len(aTxtFlds)
WHILE !Eof()
FOR nFld := 1 UPTO nFldCnt
IF cSearh $ FieldGet(aTxtFlds[nFld])
nCount := nCount + 1
// The function is case sensitive so otherwise you need
// to do an UPPER() in the IF statement
aRecs[nRecCount] := RecNo()
EXIT
ENDIF
NEXT
IF nCount = 4096
EXIT
ENDIF
DBSkip()
ENDDO
ELSE
// No text fields so we cannot do an evaluation
ENDIF
Asize(aRecs, nCount)
RETURN aRecs
HTH,
Johan Nel
Pretoria, South Africa.
otto wrote:
> Hello
>
> I'm looking for a sample who make a global search through a DBF file (all fields) and return one array with the rec no where the word was found.
> Many do while are very slow! Are they a lib who have a fast function for that?
>
> Many thanks in advance for any help.
>
> Otto
| |
| Stephen Quinn 2007-09-20, 9:55 pm |
| Otto
> Many thanks for your help.
> Need also Sixdrivers!
> ....
> #include "SIXCDX.CH
You should experiment with the code a bit.
I had a quick look at the Wilds code and there's nothing in there specific to SIXDRIVER
- maybe TAG on the index (if you want to be pedantic) but COMIX supports them as well
Comment that line and maybe add the COMIX header(s) and you shouldn't have any trouble compiling it.
--
CYA
Steve
| |
|
| On Thu, 20 Sep 2007 09:53:49 +0200, Johan Nel <johan.nel555@removeall555.xsinet.co.za> wrote:
Hello Johan
Many thanks for your sample. Work.
I make some test and some small changes. But, if I understand you sample,it do not search in num fields ?
Regards
Otto
[color=darkred]
>Otto,
>
>Seems you don't have the necessary libraries to use some of the
>functions to do what you need.
>
>Here is my solution trying to limit the number of loops/evaluations that
>need to be done to a minimum without the need of some commercial
>libraries, as well as a minimum overhead on Aadd() etc.
>
>Just remember the function is limited to max array size 4096 if I
>remember correctly from my Clipper days.
>
>In your code:
>...
>SELECT (cAlias)
>DbGoTop()
>WHILE !Eof()
> aSearchRec := StrInDbf('XYZ')
> // Do with aSearchRec what you need to do
>ENDDO
>...
>
>FUNCTION StrInDbf(cSearch)
>LOCAL aRecs := Array(4096), aTxtFlds := {}, aStruc
>LOCAL nFld, nFldCnt, nCount := 0
>aStruc := DbStruct()
>// Put TextField positions in an array not to have to evaluate
>// non text fields in the loop
>Aeval(aStruc, ;
> {|aFld, nPos| iif(aFld[DBS_TYPE] $ 'CM', ;
> Aadd(aTxtFlds, nPos), NIL)})
>IF Len(aTxtFlds) > 0
> nFldCnt := Len(aTxtFlds)
> WHILE !Eof()
> FOR nFld := 1 UPTO nFldCnt
> IF cSearh $ FieldGet(aTxtFlds[nFld])
> nCount := nCount + 1
> // The function is case sensitive so otherwise you need
> // to do an UPPER() in the IF statement
> aRecs[nRecCount] := RecNo()
> EXIT
> ENDIF
> NEXT
> IF nCount = 4096
> EXIT
> ENDIF
> DBSkip()
> ENDDO
>ELSE
>// No text fields so we cannot do an evaluation
>ENDIF
>Asize(aRecs, nCount)
>RETURN aRecs
>
>HTH,
>Johan Nel
>Pretoria, South Africa.
>
>otto wrote:
| |
|
| On Fri, 21 Sep 2007 02:42:22 GMT, "Stephen Quinn" <stevejqNO@SPbigpond.AMnet.au> wrote:
>Otto
>
>
>You should experiment with the code a bit.
>
>I had a quick look at the Wilds code and there's nothing in there specific to SIXDRIVER
> - maybe TAG on the index (if you want to be pedantic) but COMIX supports them as well
>
>Comment that line and maybe add the COMIX header(s) and you shouldn't have any trouble compiling it.
OK, I will have a look again.
But I like the very small sample from Johan Nel.
| |
| Johan Nel 2007-09-21, 3:55 am |
| Otto,
You are correct. I did it as an example seeing all the others are also
TextSearch algorithms, but no reason you cannot change it to search in
all type of fields.
Although you mentioned in your initial message "return one array with
the rec no where the word was found". No way you going to find a "word"
in anything else than Character or MemoFields.
Regards,
Johan Nel
otto wrote:
> On Thu, 20 Sep 2007 09:53:49 +0200, Johan Nel <johan.nel555@removeall555.xsinet.co.za> wrote:
>
> Hello Johan
>
> Many thanks for your sample. Work.
> I make some test and some small changes. But, if I understand you sample,it do not search in num fields ?
>
> Regards
> Otto
| |
|
| On Fri, 21 Sep 2007 07:56:28 +0200, Johan Nel <johan.nel555@removeall555.xsinet.co.za> wrote:
>Otto,
>
>You are correct. I did it as an example seeing all the others are also
>TextSearch algorithms, but no reason you cannot change it to search in
>all type of fields.
>
Thanks again. Work well and take less then 10s to find a word in more then 20'000 Rec's.
>Although you mentioned in your initial message "return one array with
>the rec no where the word was found". No way you going to find a "word"
>in anything else than Character or MemoFields.
>
Johan
You are right. I will try to make the necessary changes.
Regards
Otto
[color=darkred]
>Regards,
>
>Johan Nel
>
>otto wrote:
| |
| Johan Nel 2007-09-21, 3:55 am |
| otto wrote:
> Thanks again. Work well and take less then 10s to find a word in more then 20'000 Rec's.
You welcome, glad if it could be of help.
> You are right. I will try to make the necessary changes.
Feel free to contact me if there are any problems regarding the changes.
I did the function out of my head so did not test it. There are still
a couple of changes that could be made to speed up the function even
more if you do the DbStruct() outside the function and pass the array in
as a parameter, as well as the aTxtFlds to do the loops only once and
not every time the function is called. This will however only have a
(slight) speed increase if you need to call the function more than once
(e.g. the 4096 limit is reached before the end of file is reached).
I would prefer it the way it is at the moment, cause everything is then
encapsulated inside the function.
Regards,
Johan Nel
| |
| Stephen Quinn 2007-09-21, 9:55 pm |
| Johan
You could actually do away with the array altogether if you built a custom index using OrdKeyAdd().
Eg
// Not sure of the correct syntax for COMIX but basically you want an empty index
// and add records that fit the criteria
INDEX ON RECNO() TAG '_WORDS' TO cTempIdx CUSTOM
DO WHILE ! EOF
// Search the records/fields for the word
IF foundOne
OrdKeyAdd( RecNo() )
ENDIF
SKIP()
ENDDO
IF OrderKeyCount() = 0
// Nothing found so tell the user
// Put the old order back
SET ORDER TO nPrevOrder
ENDIF
DBGOTOP()
BROWSE()
--
CYA
Steve
| |
| Joe Wright 2007-09-22, 6:55 pm |
| Johan Nel wrote:
> otto wrote:
> Otto,
>
> You are correct. I did it as an example seeing all the others are also
> TextSearch algorithms, but no reason you cannot change it to search in
> all type of fields.
>
> Although you mentioned in your initial message "return one array with
> the rec no where the word was found". No way you going to find a "word"
> in anything else than Character or MemoFields.
>
> Regards,
>
> Johan Nel
>
A .dbf structure has a 32-byte header, some number of 32-bit field
descriptors and then the 'data' pointed to by the 'offset' in the
header. All of this 'data' is ASCII characters. There is an uncounted
terminating character, chr(26) or ^Z or whatever as the last character
in the .dbf file.
Fields of type 'N' are ASCII too. Assume a Clipper field..
"'amount', 'n', 10, 2"
When we 'replace amount with 123.45' the actual field in the .dbf is
ASCII like ' 123.45', ten ASCII characters.
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
| |
| N:dlzc D:aol T:com \(dlzc\) 2007-09-22, 6:55 pm |
| Dear Joe Wright:
"Joe Wright" <joewwright@comcast.net> wrote in message
news:sK-dnYsBnNhpx2jbnZ2dnUVZ_iydnZ2d@comcast.com...
....
>
> A .dbf structure has a 32-byte header, some number of
> 32-bit field descriptors and then the 'data' pointed to by
> the 'offset' in the header. All of this 'data' is ASCII
> characters. There is an uncounted terminating character,
> chr(26) or ^Z or whatever as the last character in the .dbf
> file.
>
> Fields of type 'N' are ASCII too. Assume a Clipper field..
>
> "'amount', 'n', 10, 2"
>
> When we 'replace amount with 123.45' the actual field
> in the .dbf is ASCII like ' 123.45', ten ASCII characters.
Yes, but searching the "glorified flat file" behind the header
has issues in that memo fields are not searched and dates are not
in "user friendly" formats. But you are right, it is possible
that someone would want to search on "number clusters" too.
David A. Smith
| |
| Joe Wright 2007-09-22, 6:55 pm |
| N:dlzc D:aol T:com (dlzc) wrote:
> Dear Joe Wright:
>
> "Joe Wright" <joewwright@comcast.net> wrote in message
> news:sK-dnYsBnNhpx2jbnZ2dnUVZ_iydnZ2d@comcast.com...
> ...
>
> Yes, but searching the "glorified flat file" behind the header
> has issues in that memo fields are not searched and dates are not
> in "user friendly" formats. But you are right, it is possible
> that someone would want to search on "number clusters" too.
>
> David A. Smith
>
>
Dates are not "user friendly"? Today is "20070922". What could be more
friendly than that?
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
| |
| N:dlzc D:aol T:com \(dlzc\) 2007-09-22, 6:55 pm |
| Dear Joe Wright:
"Joe Wright" <joewwright@comcast.net> wrote in message
news:ka2dnXYOlsDQGGjbnZ2dnUVZ_vumnZ2d@co
mcast.com...
> N:dlzc D:aol T:com (dlzc) wrote:
>
> Dates are not "user friendly"? Today is "20070922". What
> could be more friendly than that?
Some people still put the day before the month. So more user
friendly would be "2007sep22", but it doesn't sort particularly
well... ;> )
Then we have the looming y2032 crisis when some linux/unix
systems lose their minds. After that we have the y10k crisis,
which I won't be around for...
David A. Smith
| |
| Stephen Quinn 2007-09-23, 3:55 am |
| David
> Some people still put the day before the month. So more user friendly would be "2007sep22", but it doesn't sort
> particularly well... ;> )
People don't have access to the underlying data storage to do that - DBF dates are all stored as 'YYYYMMDD'.
If they're stored in 'non-standard' formats in memoes then it's the users problem to remember what they entered not the
developers.
[color=darkred]
Memoes can be searched the same way as DBFs using FOpen(), the problem is reversing the offsets into record numbers hard
but not impossible.
--
CYA
Steve
| |
| N:dlzc D:aol T:com \(dlzc\) 2007-09-23, 3:55 am |
| Dear Stephen Quinn:
"Stephen Quinn" <stevejqNO@SPbigpond.AMnet.au> wrote in message
news:4UkJi.1171$H22.0@news-server.bigpond.net.au...
> David
>
[color=darkred]
> People don't have access to the underlying data storage
> to do that - DBF dates are all stored as 'YYYYMMDD'.
I had proposed opening the dbf (also) with fopen, and searching
each full record as a single string. In that case, the
'YYYYMMDD' would be accessible to search.
> If they're stored in 'non-standard' formats in memoes then
> it's the users problem to remember what they entered
> not the developers.
>
>
> Memoes can be searched the same way as DBFs using
> FOpen(), the problem is reversing the offsets into record
> numbers hard but not impossible.
Are you sure that it is in general possible? Is there a "reverse
pointer" stored with each "memochunk", pointing back to its
associated dbf record? It think you only get one-way pointers,
from dbf to dbt (or whatever).
David A. Smith
| |
| Stephen Quinn 2007-09-23, 3:55 am |
| David
> Are you sure that it is in general possible? Is there a "reverse pointer" stored with each "memochunk", pointing back
> to its associated dbf record? It think you only get one-way pointers, from dbf to dbt (or whatever).
You don't do it separately, but while looking at it as you step through the DBF.
The following can be done with Flexfile 2 & 3 and FPT memoes, I've never looked at DBTs to see what the memo pointer
contains.
While stepping through the DBF data read/get the memo pointer from the record data from it's offset in the record
- decode it (it contains (Flexfile2) or is (FPT/Flexfile3) the start of data offset into the memo file)
- move to the offset in the memo file
- length of the data is stored at the offset in the memo (Flexfile & FPT)
- search the data
--
CYA
Steve
| |
| Johan Nel 2007-09-23, 7:55 am |
| Hi Stephen,
Yes I agree with you on this one.
Many ways to skin the cat. My example was just given as a working basis
for Otto to work from, in the end it drills down to not wanting to do
unneccesary checks on fields not applicable that will slow the process
(non-Character fields if searching for words). Obviously reading the
..DBF as binary file would be the quickest but there are some "problems"
around MemoFields, although they can be overcome.
Regards,
Johan Nel
Stephen Quinn wrote:
> You could actually do away with the array altogether if you built a custom index using OrdKeyAdd().
| |
| Johan Nel 2007-09-23, 7:55 am |
| Hi Joe,
I agree, reading the .DBF as binary would be quickest to do the search
as all fields are actualy stored as ASCII in the .DBF file.
'Amount, 'N', 10, 2
One should just make sure your search algorithm address the problem of
say searching for "123.34" not to also return for "1123.34" etc also true.
Regards,
Johan Nel
Joe Wright wrote:
> header. All of this 'data' is ASCII characters. There is an uncounted
> terminating character, chr(26) or ^Z or whatever as the last character
> "'amount', 'n', 10, 2"
> ASCII like ' 123.45', ten ASCII characters.
| |
| N:dlzc D:aol T:com \(dlzc\) 2007-09-23, 6:55 pm |
| Dear Johan Nel:
"Johan Nel" <johan.nel555@removeall555.xsinet.co.za> wrote in
message news:1190538133.750590@vasbyt.isdsl.net...
> Hi Joe,
>
> I agree, reading the .DBF as binary would be quickest
> to do the search as all fields are actualy stored as
> ASCII in the .DBF file.
>
> 'Amount, 'N', 10, 2
>
> One should just make sure your search algorithm
> address the problem of say searching for "123.34"
> not to also return for "1123.34" etc also true.
Why would you have an asymmetric search method? Do you discount
hits on "establishment", because they are contained in
"antidisestablishmentarianism"?
Sometimes I only remember a fragment, not the whole sequence...
David A. Smith
| |
| N:dlzc D:aol T:com \(dlzc\) 2007-09-23, 6:55 pm |
| Dear Stephen Quinn:
"Stephen Quinn" <stevejqNO@SPbigpond.AMnet.au> wrote in message
news:e1mJi.1211$H22.773@news-server.bigpond.net.au...
> David
>
>
> You don't do it separately, but while looking at it as
> you step through the DBF. The following can be
> done with Flexfile 2 & 3 and FPT memoes, I've
> never looked at DBTs to see what the memo pointer contains.
OK. I am pretty sure it works like the DOS FAT table. There is
a pointer from each chunk to the next sequential chunk, and the
final chunk in a chain has a value that is terminator to the
series.
> While stepping through the DBF data read/get the
> memo pointer from the record data from it's offset
> in the record
> - decode it (it contains (Flexfile2) or is
> (FPT/Flexfile3) the start of data offset into
> the memo file)
> - move to the offset in the memo file
> - length of the data is stored at the offset
> in the memo (Flexfile & FPT)
> - search the data
If I were going to code this, I would do the dbf as binary,
concatenate a macro to postpend all the memofields to the binary
dbf record, and search the entire assembly {table record contents
as-stored, macro values}. No point in hand-coding machinery for
memos when the existing machinery works so simply.
David A. Smith
| |
| D.Campagna 2007-09-24, 6:55 pm |
| otto ha scritto:
> Hello
>
> I'm looking for a sample who make a global search through a DBF file (all fields) and return one array with the rec no where the word was found.
> Many do while are very slow! Are they a lib who have a fast function for that?
>
> Many thanks in advance for any help.
>
> Otto
Just to complete the suggestion list...
I have written an app that:
-every time a record is added, creates an array containing the words.
Let's call this record "sentence"
-then checks the presence of the words in a dictionary file
-then writes in a link file the record number of the words and record
number of the sentence
-voila. When you search a word, first check the dictrionary, indexed,
then retrieve the record number and s the link file (indexed). No
matter how many record you have, you find the sentences desired in a bip
through zillions of records.
Not sure it was what you asked for...
| |
|
| On Mon, 24 Sep 2007 18:28:37 +0200, "D.Campagna" <ynnadrebyc@tiscalinet.it> wrote:
hello and grazie
But I used the idea form Johan.
Otto
>otto ha scritto:
>Just to complete the suggestion list...
>I have written an app that:
>-every time a record is added, creates an array containing the words.
> Let's call this record "sentence"
>-then checks the presence of the words in a dictionary file
>-then writes in a link file the record number of the words and record
>number of the sentence
>-voila. When you search a word, first check the dictrionary, indexed,
>then retrieve the record number and s the link file (indexed). No
>matter how many record you have, you find the sentences desired in a bip
>through zillions of records.
>Not sure it was what you asked for...
>
>
| |
| diogenes 2007-09-25, 9:55 pm |
| On Sep 24, 9:28 am, "D.Campagna" <ynnadre...@tiscalinet.it> wrote:
> otto ha scritto:> Hello
>
>
>
>
> Just to complete the suggestion list...
> I have written an app that:
> -every time a record is added, creates an array containing the words.
> Let's call this record "sentence"
> -then checks the presence of the words in a dictionary file
> -then writes in a link file the record number of the words and record
> number of the sentence
> -voila. When you search a word, first check the dictrionary, indexed,
> then retrieve the record number and s the link file (indexed). No
> matter how many record you have, you find the sentences desired in a bip
> through zillions of records.
> Not sure it was what you asked for...
Et voila! You've just re-invented Google! :-)
-diogenes
| |
| D.Campagna 2007-09-27, 6:55 pm |
| diogenes ha scritto:
>
> Et voila! You've just re-invented Google! :-)
>
> -diogenes
>
wooooooooow!!! :)
|
|
|
|
|