For Programmers: Free Programming Magazines  


Home > Archive > Java Help > January 2008 > How windows perform File Searching?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How windows perform File Searching?
natikarsunil@gmail.com

2008-01-21, 4:41 am

hi,
i am doing a project in java for FILE SEARCHING.So i want know search
algorithms
used by windows or how windows perform file searching.
Roedy Green

2008-01-23, 7:26 pm

On Mon, 21 Jan 2008 00:15:21 -0800 (PST), natikarsunil@gmail.com
wrote, quoted or indirectly quoted someone who said :

> i am doing a project in java for FILE SEARCHING.So i want know search
>algorithms
>used by windows or how windows perform file searching.


see http://mindprod.com/project/filefinder.html
See http://mindprod.com/products1.html#BOYER

What you do is spider the documents and maintain an index. Let an SQL
engine do that for you.

See http://mindprod.com/jgloss/lucene.html

A specialised index would look like this:

Each word in the "dictionary" of words used on your site is assigned a
dense integer. Ideally most frequently used words get low numbers
(this will help compression).

You scan your documents and assign each one a dense integer.

Now you can describe the usage of a word by

word number: then a list of document numbers using that word.
or pairs document number/count of times used. Sorted with document
using the most first.

You can then create a big vector,in ram or cached in ram, to look up
by word number to get the corresponding vector of usage, also cached
in RAM.

See http://mindprod.com/jgloss/pod.html

If you have a farm of computers, each one can take a section of the
alphabet. You funnel you request to the proper server.
--
Roedy Green, Canadian Mind Products
The Java Glossary, http://mindprod.com
Roedy Green

2008-01-23, 10:26 pm

On Thu, 24 Jan 2008 00:42:20 GMT, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>word number: then a list of document numbers using that word.
>or pairs document number/count of times used. Sorted with document
>using the most first.


Another idea you might play with is logging "update transactions" then
get sorted, then processed sequentially in a batch so that you have
RAM locality.

You need a HashMap to look up words and another to look up files.
--
Roedy Green, Canadian Mind Products
The Java Glossary, http://mindprod.com
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com