Home > Archive > Java Help > January 2008 > How windows perform File Searching?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How windows perform File Searching?
|
|
| natikarsunil@gmail.com 2008-01-21, 4:41 am |
| hi,
i am doing a project in java for FILE SEARCHING.So i want know search
algorithms
used by windows or how windows perform file searching.
| |
| Roedy Green 2008-01-23, 7:26 pm |
| On Mon, 21 Jan 2008 00:15:21 -0800 (PST), natikarsunil@gmail.com
wrote, quoted or indirectly quoted someone who said :
> i am doing a project in java for FILE SEARCHING.So i want know search
>algorithms
>used by windows or how windows perform file searching.
see http://mindprod.com/project/filefinder.html
See http://mindprod.com/products1.html#BOYER
What you do is spider the documents and maintain an index. Let an SQL
engine do that for you.
See http://mindprod.com/jgloss/lucene.html
A specialised index would look like this:
Each word in the "dictionary" of words used on your site is assigned a
dense integer. Ideally most frequently used words get low numbers
(this will help compression).
You scan your documents and assign each one a dense integer.
Now you can describe the usage of a word by
word number: then a list of document numbers using that word.
or pairs document number/count of times used. Sorted with document
using the most first.
You can then create a big vector,in ram or cached in ram, to look up
by word number to get the corresponding vector of usage, also cached
in RAM.
See http://mindprod.com/jgloss/pod.html
If you have a farm of computers, each one can take a section of the
alphabet. You funnel you request to the proper server.
--
Roedy Green, Canadian Mind Products
The Java Glossary, http://mindprod.com
| |
| Roedy Green 2008-01-23, 10:26 pm |
| On Thu, 24 Jan 2008 00:42:20 GMT, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :
>word number: then a list of document numbers using that word.
>or pairs document number/count of times used. Sorted with document
>using the most first.
Another idea you might play with is logging "update transactions" then
get sorted, then processed sequentially in a batch so that you have
RAM locality.
You need a HashMap to look up words and another to look up files.
--
Roedy Green, Canadian Mind Products
The Java Glossary, http://mindprod.com
|
|
|
|
|