|
| Generic Usenet Account wrote:
> Generic Usenet Account wrote:
>
> Before I get slammed again, let me clarify the definition further. A
> weak pattern is a collection (not set, since set implies ordering) of
> symbols, not necessarily contiguous, which repeats itself more than
> once in a data stream. Even though contiguity of symbols within a
> pattern is not important, the order of the symbols within the pattern
> is important.
>
> For example, in the data stream 2-4-7-8-0-1-2-3-5-4-6-0-7 (slighly
> modified version of the original data stream i.e.
> 2-4-7-8-0-1-2-3-5-4-6-7-0) the weak pattern 2-4-7-0 does not exist.
> However, the weak patterns 2-4-7 and 2-4-0 exist in the data stream.
>
> Needless to say, the algorithm does not assume prior knowledge of these
> patterns. The algorithm should be able to discover them.
>
> -Song
Is the entire data stream known before screening for weak pattern? If
so, we can take 50% of the length of the stream, and append each symbol
with a (\d)* in the middle and compare it with the rest of the string.
(50% is kinda obvious I guess). We can keep reducing the pattern size
and continue matching.
If the stream is going to change.. we can use the largest pattern and
increase size till 50% from that to match for newer patterns. However,
I don't think this can be done efficiently in real time.
|
|