Home > Archive > C > September 2007 > Any search pattern method recommed for mmap memory
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Any search pattern method recommed for mmap memory
|
|
| Owen Zhang 2007-09-21, 7:57 am |
| I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
| |
| Mark Bluemel 2007-09-21, 6:57 pm |
| Owen Zhang wrote:
> I have a file loaded into virtual memory space by mmap. I need to
> search some key word inside the memory opened by mmap. What is the
> best and efficient way to do?
>
Google for Boyer-Moore, I suspect...
| |
| CBFalconer 2007-09-21, 6:57 pm |
| Owen Zhang wrote:
>
> I have a file loaded into virtual memory space by mmap. I need to
> search some key word inside the memory opened by mmap. What is the
> best and efficient way to do?
You don't need the 'virtual memory'. Look the following over.
/*
Leor Zolman wrote:
> On 25 Feb 2004 07:34:40 -0800, joan@ljungh.se (spike) wrote:
>
>
> I think so. Here's a version I just threw together:
*/
/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.
This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/
size_t chrcount; /* debuggery, count of input chars, zeroed */
/* --------------------- */
/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;
assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#ifdef DEBUG
for (i = 0; i <= lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */
/* --------------------- */
/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */
assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */
/* --------------------- */
/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */
lgh = strlen(marker);
if (!(next = malloc(1 + lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */
/* --------------------- */
int main(int argc, char **argv)
{
FILE *f;
f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */
--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com
| |
| Keith Thompson 2007-09-21, 6:57 pm |
| Owen Zhang <owenzhang.chicago@gmail.com> writes:
> I have a file loaded into virtual memory space by mmap. I need to
> search some key word inside the memory opened by mmap. What is the
> best and efficient way to do?
The mmap() function is not part of standard C. If it's relevant to
your question, you should ask in a system-specific newsgroup, most
likely comp.unix.programmer.
But I don't see how it's relevant. Is there some reason you think
searching a chunk of memory allocated by mmap is different from
searching any other chunk of memory?
Standard C provides some simple searching functions such as strstr().
If that doesn't suit your needs, then you probably have an algorithm
question; comp.programming is likely to be the best place to ask.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
| |
| Tor Rustad 2007-09-21, 6:57 pm |
| Owen Zhang wrote:
> I have a file loaded into virtual memory space by mmap. I need to
> search some key word inside the memory opened by mmap. What is the
> best and efficient way to do?
The on-topic answer is: strstr().
mmap() specific considerations, should rather be posted to "c.u.programmer".
--
Tor <torust [at] online [dot] no>
| |
| Richard Tobin 2007-09-21, 6:57 pm |
| In article <1190377593.366366.90200@57g2000hsv.googlegroups.com>,
Owen Zhang <owenzhang.chicago@gmail.com> wrote:
>I have a file loaded into virtual memory space by mmap. I need to
>search some key word inside the memory opened by mmap. What is the
>best and efficient way to do?
I think about all you can say is that a method that access data
sequentially rather than randomly is likely to work better, because it
matches disk access better. That's assuming you don't have any kind
of indexing of course.
-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
| |
| Keith Thompson 2007-09-21, 6:57 pm |
| Tor Rustad <tor_rustad@hotmail.com> writes:
> Owen Zhang wrote:
>
> The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
| |
| Tor Rustad 2007-09-21, 6:57 pm |
| Keith Thompson wrote:
> Tor Rustad <tor_rustad@hotmail.com> writes:
> [...]
>
> Sure, but strstr() simply searches for a specified substring, not
> necessarily for a "keyword" (which may imply it's delimited somehow).
> Without more information, we can't be sure whether strstr will do the
> job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
--
Tor <torust [at] online [dot] no>
| |
| Keith Thompson 2007-09-21, 6:57 pm |
| Tor Rustad <tor_rustad@hotmail.com> writes:
> Keith Thompson wrote:
>
> I don't follow.. why can't OP check for extra requirements after each
> match by strstr()?
Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
> OTOH, files are typically not null terminated, but I didn't bother to
> check if OP needed to address this issue when using mmap().
I hadn't thought of that, though it shouldn't be to hard to address
it.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
| |
| Tor Rustad 2007-09-22, 7:57 am |
| Keith Thompson wrote:
> Tor Rustad <tor_rustad@hotmail.com> writes:
>
> Yes, he could do that, but it might not be as efficient as a more
> specialized search. If the keyword is sufficiently short, for
> example, there might be a lot of false positives. But again, we don't
> know much about the OP's requirements.
There "might" be a lot of false positives, particularly if Keith is
allowed to construct that input file! :)
OTOH, let say OP want to scan C source files for keywords, will there
normally be more matches for "int" than [ \t]?
If complex matching is required, OP should rather look into using a
regular expression library, or a lex tool. No reason to reinvent the
wheel for this.
>
> I hadn't thought of that, though it shouldn't be to hard to address
> it.
I had the case in mind, where other programs access the file simultaneously.
--
Tor <torust [at] online [dot] no>
|
|
|
|
|