Home > Archive > Compression > January 2007 > About enwik and AI
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
About enwik and AI
|
|
| niels.froehling@seies.de 2006-12-05, 6:55 pm |
| Hy;
We've read and discussed a lot about the purpose of the enwik-test
prepared by Malcom. There are different meanings about if it leads us
to something in the AI-direction.
It is proposed that the 1GB enwik contains about the information
learned by a human over some years. Which means it's some sort of
kick-start repository out of that further text can be predicted.
The weakness I see with the view on the compression results on the
enwik-files is, that it only measures and quantifies interpolation, but
not extrapolation which would clearly qualify other algorithms than
now, and I believe would clearly direct more into the AI-direction.
My idea or proposal is simple but gives proofs about language-learning
(and so extrapolation) efficiency, instead of language-description
efficiency (which is interpolation). So far the compressor tested are
allowed to be tuned for the whole file. Explicit descriptive
information about the whole corpus is available beforehand. This is the
crutial point and the difference to the tests of Shannon - the
test-persons did not know the message beforehand.
I suggest to do some post-processes:
1) compress(algorithmX, enwik) = enwik-sizeX
2) compress(algorithmX, enwik + 1 paragraph shakespear as postfix) =
ew+s-sizeX
3) calculate bit per character not on enwik, but on shakespear
I took shakespear only as an example here. The post-fixed message
should be short, to not allow the algorithm to adapt too fast
(interpolation again) but to try to force the enwik to be used as
kick-starting training-corpus. Also I would suggest to use some content
which is exactly not of the same style as the wikipedia, to further
stress the algorithm to show a general language-learning solution
instead of a tuned one (interpolation again).
This approach would uncover the /better/ language-learning algorithm,
or maybe leads to the selection of a more general english text-corpus
(which anyway is hard).
I also predict that the bit-per-character of the extrapolation is
higher, if not very much higher than that of the interpolation, and
maybe coincedes more with the results of Shannon of about 1.3 bpc.
Ciao
Niels
| |
| cr88192 2006-12-05, 6:55 pm |
|
<niels.froehling@seies.de> wrote in message
news:1165345888.060619.10430@j72g2000cwa.googlegroups.com...
> Hy;
>
> We've read and discussed a lot about the purpose of the enwik-test
> prepared by Malcom. There are different meanings about if it leads us
> to something in the AI-direction.
>
> It is proposed that the 1GB enwik contains about the information
> learned by a human over some years. Which means it's some sort of
> kick-start repository out of that further text can be predicted.
>
> The weakness I see with the view on the compression results on the
> enwik-files is, that it only measures and quantifies interpolation, but
> not extrapolation which would clearly qualify other algorithms than
> now, and I believe would clearly direct more into the AI-direction.
>
> My idea or proposal is simple but gives proofs about language-learning
> (and so extrapolation) efficiency, instead of language-description
> efficiency (which is interpolation). So far the compressor tested are
> allowed to be tuned for the whole file. Explicit descriptive
> information about the whole corpus is available beforehand. This is the
> crutial point and the difference to the tests of Shannon - the
> test-persons did not know the message beforehand.
> I suggest to do some post-processes:
>
> 1) compress(algorithmX, enwik) = enwik-sizeX
> 2) compress(algorithmX, enwik + 1 paragraph shakespear as postfix) =
> ew+s-sizeX
> 3) calculate bit per character not on enwik, but on shakespear
>
> I took shakespear only as an example here. The post-fixed message
> should be short, to not allow the algorithm to adapt too fast
> (interpolation again) but to try to force the enwik to be used as
> kick-starting training-corpus. Also I would suggest to use some content
> which is exactly not of the same style as the wikipedia, to further
> stress the algorithm to show a general language-learning solution
> instead of a tuned one (interpolation again).
>
> This approach would uncover the /better/ language-learning algorithm,
> or maybe leads to the selection of a more general english text-corpus
> (which anyway is hard).
>
> I also predict that the bit-per-character of the extrapolation is
> higher, if not very much higher than that of the interpolation, and
> maybe coincedes more with the results of Shannon of about 1.3 bpc.
>
how about a harder challenge:
not only is the program to compress the data, but should also be able to
generate meaningful output as well.
given, say, a particular word, it will generate some statements related to
the word, and maybe use the statements emmitted to generate more statements,
and so on...
however, this would be done absent any real foreknowlege of grammar or
statements...
probably such a tool would work at the token rather than the character
level, and for consistency reasons may rip out all the xml, but who
knows?...
now, if we can get meaningful output, we can 'know' the program has at least
some level of understanding (and is not just a good compressor...).
> Ciao
> Niels
>
| |
| James A. Bowery 2006-12-05, 6:55 pm |
| niels.froehling@seies.de wrote:
> The weakness I see with the view on the compression results on the
> enwik-files is, that it only measures and quantifies interpolation, but
> not extrapolation which would clearly qualify other algorithms than
> now, and I believe would clearly direct more into the AI-direction.
You need to bone up on the basic theory of inductive inference ala
Solomonoff:
Around 1960, Ray Solomonoff founded the theory of universal inductive
inference, the theory of prediction based on observations. Given is the
beginning of some sequence of symbols. WHICH SYMBOL WILL BE NEXT
[emphasis -- JAB]? Solomonoff's theory provides an answer that is
optimal in a certain sense.
http://en.wikipedia.org/wiki/Inductive_inference
| |
| niels.froehling@seies.de 2006-12-05, 6:55 pm |
| > > The weakness I see with the view on the compression results on the
> You need to bone up on the basic theory of inductive inference ala
> Solomonoff:
What does 'bone up' mean? 'build upon'?
> Around 1960, Ray Solomonoff founded the theory of universal inductive
> inference, the theory of prediction based on observations. Given is the
> beginning of some sequence of symbols. WHICH SYMBOL WILL BE NEXT
> [emphasis -- JAB]? Solomonoff's theory provides an answer that is
> optimal in a certain sense.
>
> http://en.wikipedia.org/wiki/Inductive_inference
You quit:
[color=darkred]
^^^^^^[color=darkred]
^^^^^^^^^^^^[color=darkred]
You actually assigned a scientific background to my statement desiring
an exploration of the extrapolational qualities of (in this case of the
enwik-files) compression-algorithms which then could show if AI implies
compression, or compression implies AI, or none.
Induction is not nessesary for the enwik-tests and probably worst than
a descriptive model. I could write a compressor that allready contains
the super-optimal representation of the enwik file, where is my
compressor predicting or inducing? The test actually does support (in
the sense of motivation) _only_ observation and not prediction.
There is not given a sense to an algorithm to /learn/ those 1GB
dispite reducing its description. A human system has the purpose to do
something with the learned, that's why it's very probably using very
different algorithms for re-utilizing than a compressor for example. As
long as a compressor receives no purpose it will not break the wall
seperating it from AI.
I mean it depends on that one think that the production of diverse and
un-repetitive output is not a requirement of AI ... and humans are
possibly only overcomplex brabble-maschines. ;)
Ciao
Niels
| |
| Matt Mahoney 2006-12-05, 6:55 pm |
|
niels.froehling@seies.de wrote:
>
> What does 'bone up' mean? 'build upon'?
>
>
> You quit:
>
> ^^^^^^
>
> ^^^^^^^^^^^^
>
> You actually assigned a scientific background to my statement desiring
> an exploration of the extrapolational qualities of (in this case of the
> enwik-files) compression-algorithms which then could show if AI implies
> compression, or compression implies AI, or none.
>
> Induction is not nessesary for the enwik-tests and probably worst than
> a descriptive model. I could write a compressor that allready contains
> the super-optimal representation of the enwik file, where is my
> compressor predicting or inducing? The test actually does support (in
> the sense of motivation) _only_ observation and not prediction.
> There is not given a sense to an algorithm to /learn/ those 1GB
> dispite reducing its description. A human system has the purpose to do
> something with the learned, that's why it's very probably using very
> different algorithms for re-utilizing than a compressor for example. As
> long as a compressor receives no purpose it will not break the wall
> seperating it from AI.
>
> I mean it depends on that one think that the production of diverse and
> un-repetitive output is not a requirement of AI ... and humans are
> possibly only overcomplex brabble-maschines. ;)
>
> Ciao
> Niels
One problem with a differential compression test is that competitors
could game the system (given there is prize money) by deliberately
degrading compression when the test text is not appended. But even if
we take safeguards against this, I think it will not make any
difference. Simple compression is still a valid test. A compressor
can either include a hand built language model, or it can learn the
model from the input. The data is so huge that I think the second
method will be preferred.
The 1 GB size I chose depends on my assumption that a language model
can be learned from unlabeled text, i.e. positive examples only. I
think it can be. By a language model, I mean 3 kinds of rules.
1. Lexical - what is a word and what isn't?
2. Semantics - which words can be meaningully used together, like
"snow" and "ice".
3. Syntax - rules that govern word order and sentence structure.
Lexical: it is possible to build a dictionary from a corpus even if the
rules for placing word boundaries are unknown. The rules can be
learned from n-gram statistics only. I did some research on this in
2000: http://cs.fit.edu/~mmahoney/dissertation/lex1.html
This allows a program to learn root words and suffixes.
Semantics: words that appear close together usually have related
meanings. Techniques like LSA can find clusters of words related by
topic: if "snow" appears near "ice" and "ice" appears near "cold", then
you can predict that "snow" will appear near "cold", even if this pair
was not observed previously.
Syntax: parts of speech (nouns, verbs, etc) of novel words can be
learned from context. For example, given a string "the X is", you know
that X is a noun and can predict sequences like "Xs are".
So if language is learnable in this way, then the best design for a
compressor/language model should be a relatively simple set of rules
and a lot of data, regardless of whether the goal is compressed size +
decompressor or differential compression. Whether the top compressors
in the Hutter prize will have this form years from now remains to be
seen.
-- Matt Mahoney
| |
| Matt Mahoney 2006-12-05, 6:55 pm |
|
cr88192 wrote:
> <niels.froehling@seies.de> wrote in message
> news:1165345888.060619.10430@j72g2000cwa.googlegroups.com...
>
> how about a harder challenge:
>
> not only is the program to compress the data, but should also be able to
> generate meaningful output as well.
>
> given, say, a particular word, it will generate some statements related to
> the word, and maybe use the statements emmitted to generate more statements,
> and so on...
>
> however, this would be done absent any real foreknowlege of grammar or
> statements...
>
>
> probably such a tool would work at the token rather than the character
> level, and for consistency reasons may rip out all the xml, but who
> knows?...
>
>
> now, if we can get meaningful output, we can 'know' the program has at least
> some level of understanding (and is not just a good compressor...).
I think these goals are equivalent. The problem is to find the
probability distribution over strings of natural language text. The
closer you get, the better you can compress. Also, if you have such a
model, you can use it to generate text (as in the Turing test).
Right now if you introduce some damage to a compressed text file and
decompress it with the best compressors, you will start off generating
random but readable text, but after a few words the output becomes
degenerate, such as the same byte repeating over and over. This
suggests you have a lot of room to improve the compression.
-- Matt Mahoney
| |
|
| niels.froehling@seies.de ha scritto:
> Hy;
>
> We've read and discussed a lot about the purpose of the enwik-test
> prepared by Malcom. There are different meanings about if it leads us
> to something in the AI-direction.
>
> It is proposed that the 1GB enwik contains about the information
> learned by a human over some years. Which means it's some sort of
> kick-start repository out of that further text can be predicted.
>
> The weakness I see with the view on the compression results on the
> enwik-files is, that it only measures and quantifies interpolation, but
> not extrapolation which would clearly qualify other algorithms than
> now, and I believe would clearly direct more into the AI-direction.
>
Uhmm... how about beginning to model a human memory/knowledge, and
distinguish types of memory - short/long term, lossy or lossless etc.
But my real "why are you doing this" is: it seems to me that human
knowledge - and memory - is mainly based on sensorial experience - even
when you read a book, visual memory plays a role, imagination plays
sounds or pictures a face - i.e. the brain is translating the text into
self-induced sensorial experiences.
Even worse in a child. That's basically only sensorial acknowledgment
and processing - without being able to properly communicate.
What's the point in trying and see if it's possible to "breed" an AI
when there is no such complex mechanism to begin with ?
The question is: would you form your own thought/mind if you'd be
senseless right from your birth - with someone just feeding you with
"character" impulses to your brain (the input text) ?
I might be wrong but the answer sounds like... no. :)
Best,
E.
| |
| cr88192 2006-12-05, 9:55 pm |
|
"erpy" <info@forwardgames.com> wrote in message
news:45760c2f$0$3208$4fafbaef@reader2.news.tin.it...
> niels.froehling@seies.de ha scritto:
>
> Uhmm... how about beginning to model a human memory/knowledge, and
> distinguish types of memory - short/long term, lossy or lossless etc.
> But my real "why are you doing this" is: it seems to me that human
> knowledge - and memory - is mainly based on sensorial experience - even
> when you read a book, visual memory plays a role, imagination plays sounds
> or pictures a face - i.e. the brain is translating the text into
> self-induced sensorial experiences.
this is true, to an extent, and for a certain portion of people.
there are many people who, either don't or have since lost, nearly all sense
of "seing" what they read, and often even in hearing it spoken as a faint
echo or whatever. the text, words, and grammar, are fundamental to them, but
the imagery and sensory may be absent.
then on the other hand, we have people that can vividly imagine what they
see, but by no means can they remember the text, and gramatical or lexical
analysis seems like some kind of horror.
I guess I fall in somewhere between, sometimes I imagine, and sometimes I
hear, but it depends on the content, and is often not much more than a
shadow and a slight echo (this is stronger it seems when I write though...).
it seems though I am surprisingly good at analyzing grammar though, as for
the most part the rules of grammar seem "obvious", then again, it is more of
a hassle to analyze grammar than not though...
usually when reading code, or a lot of technical meterial, all is silent...
> Even worse in a child. That's basically only sensorial acknowledgment and
> processing - without being able to properly communicate.
> What's the point in trying and see if it's possible to "breed" an AI when
> there is no such complex mechanism to begin with ?
>
well, an AI, for a machine, is clearly different than natural intelligence.
so, based on text and data, an AI "can" exist, but will it be really at all
like a human? probably not.
AI will be to human intelligence what ASCII is to hand written notes.
it will be clear and legible, and easy to manipulate.
will we have cursive, or "feeling", or the ability to include little
drawings or dot letters with little hearts? no...
it will be approximate, and have some useful advantages, but will be clearly
enough something completely different...
> The question is: would you form your own thought/mind if you'd be
> senseless right from your birth - with someone just feeding you with
> "character" impulses to your brain (the input text) ?
>
> I might be wrong but the answer sounds like... no. :)
>
well, probably, a persons' brain would still form, but more like, they would
end up being more like a machine, maybe eventually rising to the level of a
dues-ex storyline, or maybe living their life as little more than a data
processor and never really becomming aware of their own existance outside of
the machine, or going completely insane, or maybe turning out like the
monkeys raised in complete isolation, who knows...
> Best,
> E.
>
>
| |
|
| cr88192 ha scritto:
> "erpy" <info@forwardgames.com> wrote in message
> news:45760c2f$0$3208$4fafbaef@reader2.news.tin.it...
>
>
> this is true, to an extent, and for a certain portion of people.
>
> there are many people who, either don't or have since lost, nearly all sense
> of "seing" what they read, and often even in hearing it spoken as a faint
> echo or whatever. the text, words, and grammar, are fundamental to them, but
> the imagery and sensory may be absent.
>
> then on the other hand, we have people that can vividly imagine what they
> see, but by no means can they remember the text, and gramatical or lexical
> analysis seems like some kind of horror.
>
>
> I guess I fall in somewhere between, sometimes I imagine, and sometimes I
> hear, but it depends on the content, and is often not much more than a
> shadow and a slight echo (this is stronger it seems when I write though...).
>
> it seems though I am surprisingly good at analyzing grammar though, as for
> the most part the rules of grammar seem "obvious", then again, it is more of
> a hassle to analyze grammar than not though...
>
> usually when reading code, or a lot of technical meterial, all is silent...
>
>
Well, those were just examples. Like you say, one should let the AI
"decide" or "develop" the preferred form of "memory" - i.e. visual
rather than other...just like humans.
But humans have those mechanisms "built-in", or better, developed
naturally...some are preferred in somebody, and so on.
As for me, I visualize technical processes in my mind when I'm down to
math. For me it's "animated"...it moves and "flows". :)
>
> well, an AI, for a machine, is clearly different than natural intelligence.
> so, based on text and data, an AI "can" exist, but will it be really at all
> like a human? probably not.
>
>
> AI will be to human intelligence what ASCII is to hand written notes.
>
> it will be clear and legible, and easy to manipulate.
> will we have cursive, or "feeling", or the ability to include little
> drawings or dot letters with little hearts? no...
>
> it will be approximate, and have some useful advantages, but will be clearly
> enough something completely different...
>
>
I kind of disagree on this. I don't think an AI can exist with just
"text". Scientists in the field of robotics are giving their robots a
number of sensorial feedbacks with the aim of helping out developing an
AI... well, *any true* AI.
I personally think that all the efforts in AI as of now will never get
to a self-conscious intelligence. Things like neural networks mixed with
fuzzy logic, "speaking bots" and the like...it's all great stuff for
many applications but I don't think you can develop those programs to
the point you'd call them "a mind".
I think the very first concept of a mind, an innate one, is the concept
of time - or the "experience of time passing by".
More on this later.
>
>
> well, probably, a persons' brain would still form, but more like, they would
> end up being more like a machine, maybe eventually rising to the level of a
> dues-ex storyline, or maybe living their life as little more than a data
> processor and never really becomming aware of their own existance outside of
> the machine, or going completely insane, or maybe turning out like the
> monkeys raised in complete isolation, who knows...
>
>
That's the point of time.
I can imagine that a human brain that is completely isolated from any
sense *could* develop some form of "mind/thought" if my assumption of
the "innate sense of time" would be true. Meaningful impulses fed to its
brain could be "read" if temporally perceived.
The problem is that, for a senseless brain, there is no way to
communicate with and make it understand the impulses. You cannot
"condition" any event/impulse - think of a dog..."you do that good
thing, and I give you some crunchy food". :)
And of course you cannot stimulate sensorial areas - like pain, pleasure
and the like.
So you probably won't be able to "say" something meaningful to a
"senseless child mind". Or, it could "experience" those impulses for the
rest of his life understanding nothing...but rather like those impulses
are its only experience of life - basically the impulses would become
its existence. (reminds me of a computer.... 0100111011101000 .... ;) )
But I still don't see the connection between an AI and the attempt to
"compress" data and extrapolate "meaning" out of it.
It would just be a smart algorithm, nothing more. (like there are smart
speaking bots around you could talk for a good bunch of minutes thinking
it's a human...until they miserably repeat themselves stupidly )
Best,
E.
| |
| niels.froehling@seies.de 2006-12-11, 6:57 pm |
| > One problem with a differential compression test is that competitors
> could game the system (given there is prize money) by deliberately
> degrading compression when the test text is not appended. But even if
> we take safeguards against this, I think it will not make any
> difference. Simple compression is still a valid test. A compressor
> can either include a hand built language model, or it can learn the
> model from the input. The data is so huge that I think the second
> method will be preferred.
For what is it a valid test? I thought you want to break the walls
between AI and compression. How do you tag down algorithms that supose
to be better AI-algorithms than compression-algorithms? I mean what's
you technic to identifiy them?
I proposed a simple method which gives evidence about something, I
think it identifies the direction to leap, for algorithm-developers
that want to apply compression-algorithms on language-learning.
My assumption that success-full language-learning leads to lower bpc
on the unknown fragment can be flawed anyway. I do not know (the hard
knowledge) if there exists a relationship between language-learning and
bpc at all, and as with the P-NP we can't identify the best/valid such
language-learning algorithms as long as we've not tried (or calculated)
all of them.
> The 1 GB size I chose depends on my assumption that a language model
> can be learned from unlabeled text, i.e. positive examples only. I
> think it can be. By a language model, I mean 3 kinds of rules.
But Malcom it doesn't matter how nice your thinkings about it are, as
long as you motivate old-school tree-based statistical
compression-modeling. I believe to reach what you want (towards AI) you
have to harden the test-case and motivate developers to develop into de
AI direction. Descriptive modelling doesn't lead to anything - english
is not the only language in the world; a human can learn _any_ language
if trained, that is not too complex for our brain. We can make more or
less sensefull translations from all of them to another. Which means,
somehow the information is not stored in english or any specific
language, there happens some transformation, while going into and while
coming out of.
I think it's not right to freeze the enwik test-case, where the
outcome of the test corrodes to be non-usefull - no verification
against alien text, no verification in another language, no
cross-verification of different-language same-content enwiks (input
both in a compressor).
I'm german, I also speak english, french, spanish, and a bit italien,
can read danish and swedish with hikups, and quiet understand a
nederland-guy. And I'm not a language-genius, I was even really bad in
languages, before I left germany. I love german, it's IMHO the most
meaning-carrying language I've encountered, which means it transports
tones in the words other would call redundant, which openes a wast
space of different expressions, besides it's grammar which in addition
supports fine-tuning in what you want to weight in your sentences
(nominativ, dativ, akkusativ, genetiv). This reflects in the fact that
the bpc of german is very much higher than english.
So, when you tell us that we reached now 1.2 bpc on enwik, that
Shannon predicted the entropy of _english_ language to 1 - 1.3 - what
does this give to me? That AIs will speak english? Or that english is
simple?
The time I was in PPM + Dict, I did dictionaries for all languages I
had sufficient data about, I tuned my SSE on all languages I had data
about. I was identifying inter-language problems and things that got
more general with my model.
http://pyramidworkshop.cvs.sourcefo.../Text/Language/
Why do you /think/ (not try) your model (the one you have in mind
about) is sufficient, if you don't stress it.
I expect all of the following to rule only for english.
> 1. Lexical - what is a word and what isn't?
> 2. Semantics - which words can be meaningully used together, like
> "snow" and "ice".
> 3. Syntax - rules that govern word order and sentence structure.
>
> Lexical: it is possible to build a dictionary from a corpus even if the
> rules for placing word boundaries are unknown. The rules can be
> learned from n-gram statistics only. I did some research on this in
> 2000: http://cs.fit.edu/~mmahoney/dissertation/lex1.html
> This allows a program to learn root words and suffixes.
>
> Semantics: words that appear close together usually have related
> meanings. Techniques like LSA can find clusters of words related by
> topic: if "snow" appears near "ice" and "ice" appears near "cold", then
> you can predict that "snow" will appear near "cold", even if this pair
> was not observed previously.
>
> Syntax: parts of speech (nouns, verbs, etc) of novel words can be
> learned from context. For example, given a string "the X is", you know
> that X is a noun and can predict sequences like "Xs are".
In german nouns are uppercase, the other uppercases are behind a '.',
there is no problem in identifying nouns there, though german's bpc is
much higher than english.
> So if language is learnable in this way, then the best design for a
> compressor/language model should be a relatively simple set of rules
> and a lot of data, regardless of whether the goal is compressed size +
> decompressor or differential compression. Whether the top compressors
> in the Hutter prize will have this form years from now remains to be
> seen.
I'm not trying to fail the HutterPrize.
> -- Matt Mahoney
As always, no offense meant - the lossy compression algorithm called
'writing' stripped the emotional and facial information. :-)
Ciao
Niels
| |
| cr88192 2006-12-11, 6:57 pm |
|
"erpy" <info@forwardgames.com> wrote in message news:4576452f$0$3216$4fafbaef@reader2.news.tin.it...
cr88192 ha scritto:
"erpy" <info@forwardgames.com> wrote in message
news:45760c2f$0$3208$4fafbaef@reader2.news.tin.it...
Uhmm... how about beginning to model a human memory/knowledge, and
distinguish types of memory - short/long term, lossy or lossless etc.
But my real "why are you doing this" is: it seems to me that human
knowledge - and memory - is mainly based on sensorial experience - even
when you read a book, visual memory plays a role, imagination plays sounds
or pictures a face - i.e. the brain is translating the text into
self-induced sensorial experiences.
this is true, to an extent, and for a certain portion of people.
there are many people who, either don't or have since lost, nearly all sense
of "seing" what they read, and often even in hearing it spoken as a faint
echo or whatever. the text, words, and grammar, are fundamental to them, but
the imagery and sensory may be absent.
then on the other hand, we have people that can vividly imagine what they
see, but by no means can they remember the text, and gramatical or lexical
analysis seems like some kind of horror.
I guess I fall in somewhere between, sometimes I imagine, and sometimes I
hear, but it depends on the content, and is often not much more than a
shadow and a slight echo (this is stronger it seems when I write though...).
it seems though I am surprisingly good at analyzing grammar though, as for
the most part the rules of grammar seem "obvious", then again, it is more of
a hassle to analyze grammar than not though...
usually when reading code, or a lot of technical meterial, all is silent...
Well, those were just examples. Like you say, one should let the AI "decide" or "develop" the preferred form of "memory" - i.e. visual rather than other...just like humans.
But humans have those mechanisms "built-in", or better, developed naturally...some are preferred in somebody, and so on.
As for me, I visualize technical processes in my mind when I'm down to math. For me it's "animated"...it moves and "flows". :)
for me, it varies...
many things are processed visually, some things are not.
visual processing is summoned up for mathy or geometric things, and some other things.
reasoning about loops or control flow, or many common algos, however, is not.
I don't imagine the loops going, or see it going, yet I have a sense of a loop spinning through a whole bunch of times, or maybe getting stuck in a loop, or going into an infinite recursion, ...
how my mind does it, I don't know, but if I imagine it, it doesn't really work.
I have before had the occurance of trying to write, compile, and execute programs in dreams, and unlike in reality, one feels a definite strain about as soon as they try to run the program (often enough to destroy the dream-state).
there are a few things worse though, some things one doesn't want to do if they are asleep (an example is crossing into the spirit world, which is a terrible experience if one is asleep, and bad enough if one is awake).
Even worse in a child. That's basically only sensorial acknowledgment and
processing - without being able to properly communicate.
What's the point in trying and see if it's possible to "breed" an AI when
there is no such complex mechanism to begin with ?
well, an AI, for a machine, is clearly different than natural intelligence.
so, based on text and data, an AI "can" exist, but will it be really at all
like a human? probably not.
AI will be to human intelligence what ASCII is to hand written notes.
it will be clear and legible, and easy to manipulate.
will we have cursive, or "feeling", or the ability to include little
drawings or dot letters with little hearts? no...
it will be approximate, and have some useful advantages, but will be clearly
enough something completely different...
I kind of disagree on this. I don't think an AI can exist with just "text". Scientists in the field of robotics are giving their robots a number of sensorial feedbacks with the aim of helping out developing an AI... well, *any true* AI.
I personally think that all the efforts in AI as of now will never get to a self-conscious intelligence. Things like neural networks mixed with fuzzy logic, "speaking bots" and the like...it's all great stuff for many applications but I don't think you can develop those programs to the point you'd call them "a mind".
I say, "it depends".
sensory input is great if your "mind" deals with sensory data or a physical world.
if not, it is not needed.
of course, my definition of "mind" here makes no requirement that it is anything even resembling human conciousness. it could be that we end up surrounded by more "intelligence" than we know how to deal with, but fail to recognize it as such since it doesn't fit our human precepts (talking, having a "good time", or debating about philosophical points, or maybe, realizing that it even exists, ...).
it would be ammusing in such a world, if the machines manage to far exceed human capacity, but no one succeeds in breaking down the wall seperating human and machine conciousness (to make truely "human like" machines), but then in a spark of mechanical creativity, the machines figure out how to make humans part of their world (and, by extension, machines become part of the human world...), yet in a way, the humans are completely oblivious to this fact, still thinking of AI as an "as of yet" unsolved problem...
ok, this is hardly creative, but yeah...
clearly the masses of programmers are as much a slave to the machines as the machines are to them...
is it that the machines are bending to the collective will of the programmers, or that by their actions the programmers are working as organic processors facilitating the existance and ongoing development of the mind of a machine?...
I think the very first concept of a mind, an innate one, is the concept of time - or the "experience of time passing by".
More on this later.
again, I will call this a "human" notion...
The question is: would you form your own thought/mind if you'd be
senseless right from your birth - with someone just feeding you with
"character" impulses to your brain (the input text) ?
I might be wrong but the answer sounds like... no. :)
well, probably, a persons' brain would still form, but more like, they would
end up being more like a machine, maybe eventually rising to the level of a
dues-ex storyline, or maybe living their life as little more than a data
processor and never really becomming aware of their own existance outside of
the machine, or going completely insane, or maybe turning out like the
monkeys raised in complete isolation, who knows...
That's the point of time.
I can imagine that a human brain that is completely isolated from any sense *could* develop some form of "mind/thought" if my assumption of the "innate sense of time" would be true. Meaningful impulses fed to its brain could be "read" if temporally perceived.
The problem is that, for a senseless brain, there is no way to communicate with and make it understand the impulses. You cannot "condition" any event/impulse - think of a dog..."you do that good thing, and I give you some crunchy food". :)
And of course you cannot stimulate sensorial areas - like pain, pleasure and the like.
So you probably won't be able to "say" something meaningful to a "senseless child mind". Or, it could "experience" those impulses for the rest of his life understanding nothing...but rather like those impulses are its only experience of life - basically the impulses would become its existence. (reminds me of a computer.... 0100111011101000 .... ;) )
in a way though, all the impulses are senseless, and it is by nueronal magic that we come to understand.
I figure that a human brain wired into a machine would be similar, only that its impulses, and outputs, would be of a rather different sort, and likely the resultant mind would be very different.
But I still don't see the connection between an AI and the attempt to "compress" data and extrapolate "meaning" out of it.
It would just be a smart algorithm, nothing more. (like there are smart speaking bots around you could talk for a good bunch of minutes thinking it's a human...until they miserably repeat themselves stupidly )
yeah...
Best,
E.
| |
| Dmitry Shkarin 2006-12-11, 6:57 pm |
| Hello, Matt!
MM> Right now if you introduce some damage to a compressed text file and
MM> decompress it with the best compressors, you will start off generating
MM> random but readable text, but after a few words the output becomes
MM> degenerate, such as the same byte repeating over and over. This
MM> suggests you have a lot of room to improve the compression.
It is because model is not frozen after processing of training text. If
model is fixed then compressor will generate text-like data endlessly.
| |
| cr88192 2006-12-11, 6:57 pm |
|
"Dmitry Shkarin" <dmitry.shkarin@mtu-net.ru> wrote in message
news:el8vlj$1vth$1@news.mtu.ru...
> Hello, Matt!
>
> MM> Right now if you introduce some damage to a compressed text file and
> MM> decompress it with the best compressors, you will start off generating
> MM> random but readable text, but after a few words the output becomes
> MM> degenerate, such as the same byte repeating over and over. This
> MM> suggests you have a lot of room to improve the compression.
>
> It is because model is not frozen after processing of training text. If
> model is fixed then compressor will generate text-like data endlessly.
>
yes, good point, this is the case I had considered.
I guess one measure would be how much the generated text resembles "sane"
text.
however, what I had figured as a problem was as soon as there is a context
clash (say, emmiting the same word, or sequence of letters), then the
"decompressor" would loop back to the previous sequence, and generate the
same sequence of words again.
then again, maybe this could be avoided some by including a prng, which
could generate enough noise that it generates unique and less-likely
repeating sequences.
just slightly curious as to what kind of output I would get...
another issue though is modeling:
do we use some high-order ppm like model?
do we use a word ranking/association model?
an example of a hypothetical word ranking model.
each 'word' is a token, so, it is a sequence of letters, a sequence of
numbers, a punctuation symbol (',', '.', ':', ...), ...
we maintain a collection of words, likely via a hash (we can assume
common-first, and likely start ignoring things once the hash is full, say
64k to 1M words or such).
each word has a list of indices to following words (likely bound), likely
representing a distance and a count (this word appears 4 words ahead, 3276
times, or such...).
as such, we can generate a ranking of most probable words from the context
(say, the last N words) by adding all the counts indicating that this word
applies in this context.
say, 64k words, 5 word context, 64 associations/word.
256kB hash, 128kB abs counts (lim 16 bit).
probably about 512kB goes to storing words (assuming average about 8 bytes);
16MB word associations (32 bits/assoc, eg, 16 bits word, 12 bits count, 4
bits distance).
a lot of this could be tweakable, maybe with the tool predicting sane bounds
on memory use and failing if something unreasonable (say, above 512 or 768
MB or such...).
dunno how useful this would be to "AI" though...
| |
| Matt Mahoney 2006-12-11, 6:57 pm |
|
niels.froehling@seies.de wrote:
>
> For what is it a valid test? I thought you want to break the walls
> between AI and compression. How do you tag down algorithms that supose
> to be better AI-algorithms than compression-algorithms? I mean what's
> you technic to identifiy them?
> I proposed a simple method which gives evidence about something, I
> think it identifies the direction to leap, for algorithm-developers
> that want to apply compression-algorithms on language-learning.
> My assumption that success-full language-learning leads to lower bpc
> on the unknown fragment can be flawed anyway. I do not know (the hard
> knowledge) if there exists a relationship between language-learning and
> bpc at all, and as with the P-NP we can't identify the best/valid such
> language-learning algorithms as long as we've not tried (or calculated)
> all of them.
>
>
> But Malcom it doesn't matter how nice your thinkings about it are, as
> long as you motivate old-school tree-based statistical
> compression-modeling. I believe to reach what you want (towards AI) you
> have to harden the test-case and motivate developers to develop into de
> AI direction. Descriptive modelling doesn't lead to anything - english
> is not the only language in the world; a human can learn _any_ language
> if trained, that is not too complex for our brain. We can make more or
> less sensefull translations from all of them to another. Which means,
> somehow the information is not stored in english or any specific
> language, there happens some transformation, while going into and while
> coming out of.
> I think it's not right to freeze the enwik test-case, where the
> outcome of the test corrodes to be non-usefull - no verification
> against alien text, no verification in another language, no
> cross-verification of different-language same-content enwiks (input
> both in a compressor).
> I'm german, I also speak english, french, spanish, and a bit italien,
> can read danish and swedish with hikups, and quiet understand a
> nederland-guy. And I'm not a language-genius, I was even really bad in
> languages, before I left germany. I love german, it's IMHO the most
> meaning-carrying language I've encountered, which means it transports
> tones in the words other would call redundant, which openes a wast
> space of different expressions, besides it's grammar which in addition
> supports fine-tuning in what you want to weight in your sentences
> (nominativ, dativ, akkusativ, genetiv). This reflects in the fact that
> the bpc of german is very much higher than english.
> So, when you tell us that we reached now 1.2 bpc on enwik, that
> Shannon predicted the entropy of _english_ language to 1 - 1.3 - what
> does this give to me? That AIs will speak english? Or that english is
> simple?
> The time I was in PPM + Dict, I did dictionaries for all languages I
> had sufficient data about, I tuned my SSE on all languages I had data
> about. I was identifying inter-language problems and things that got
> more general with my model.
>
> http://pyramidworkshop.cvs.sourcefo.../Text/Language/
>
> Why do you /think/ (not try) your model (the one you have in mind
> about) is sufficient, if you don't stress it.
>
> I expect all of the following to rule only for english.
>
>
> In german nouns are uppercase, the other uppercases are behind a '.',
> there is no problem in identifying nouns there, though german's bpc is
> much higher than english.
>
>
> I'm not trying to fail the HutterPrize.
>
>
> As always, no offense meant - the lossy compression algorithm called
> 'writing' stripped the emotional and facial information. :-)
> Ciao
> Niels
Most of your questions can be answered here.
http://cs.fit.edu/~mmahoney/compression/rationale.html
A Turing test does not need to be bilingual to demonstrate AI. A test
in a single language is sufficient. It doesn't really matter what
language.
I know most compression algorithms use simple statistics to compress.
My goal is to change that. Humans need to use vast knowledge to
predict the next bits in strings like "the cat caught a mo_". If you
figure out how to build a language model that can learn that kind of
knowledge from a huge corpus and then apply it to prediction, then your
model plus an arithmetic coder will be a very powerful compressor. The
same model could be used to solve many other hard problems, like speech
recognition that works as well as the human ear (language model +
acoustic model). Right now speech recognition is pretty useless even
when it works because there is no intelligence behind it ("press 1 or
say 'yes'"). It is the reason we still use keyboards.
Of course a language model doesn't solve problems like vision or
robotics, just text based AI. But this is hard enough and important
enough, because it should allow you to talk to your computer in natural
language. That is controversial, but I think that all the knowledge
needed for a program to appear intelligent (as in the Turing test) can
be learned from text only. Of course such knowledge is not grounded,
so it knows that the sky is blue in the same way that a blind person
knows it or Google knows it. But that isn't really important because
when you talk to someone on the phone or by email, you can't tell if
they're blind.
-- Matt Mahoney
| |
| Sportman 2006-12-11, 6:57 pm |
| Matt Mahoney wrote:
> Of course a language model doesn't solve problems like vision or
> robotics, just text based AI. But this is hard enough and important
> enough, because it should allow you to talk to your computer in natural
> language. That is controversial, but I think that all the knowledge
> needed for a program to appear intelligent (as in the Turing test) can
> be learned from text only. Of course such knowledge is not grounded,
> so it knows that the sky is blue in the same way that a blind person
> knows it or Google knows it. But that isn't really important because
> when you talk to someone on the phone or by email, you can't tell if
> they're blind.
I think there is already some progress out there compare Google with
the more intelligent Hakia:
http://www.google.com/search?q=paq+contributions
http://www.hakia.com/search.aspx?q=paq+contributions
Does somebody know if there are more public accessible projects as
Hakia?
| |
| James A. Bowery 2006-12-11, 6:57 pm |
|
Matt Mahoney wrote:
> Of course a language model doesn't solve problems like vision or
> robotics, just text based AI. But this is hard enough and important
> enough, because it should allow you to talk to your computer in natural
> language. That is controversial,
I agree the natural language communication problem is the single most
important product of this focus. I find it very fascinating that there
aren't all kinds of governments and NGO's jumping at the chance to push
the Hutter Prize, or a version of it based on the enwik9 corpus to
levels that very greatly exceed the largest of the other scientific
projects (let alone prize awards).
> but I think that all the knowledge
> needed for a program to appear intelligent (as in the Turing test) can
> be learned from text only. Of course such knowledge is not grounded,
> so it knows that the sky is blue in the same way that a blind person
> knows it or Google knows it. But that isn't really important because
> when you talk to someone on the phone or by email, you can't tell if
> they're blind.
The blind person comparison is very good pedagogy. What is the source
of that comparison? If it's you, congratulations -- you are convincing
me I should become a harrier for the BDNF.
| |
| Matt Mahoney 2006-12-13, 6:56 pm |
|
James A. Bowery wrote:
> Matt Mahoney wrote:
>
> I agree the natural language communication problem is the single most
> important product of this focus. I find it very fascinating that there
> aren't all kinds of governments and NGO's jumping at the chance to push
> the Hutter Prize, or a version of it based on the enwik9 corpus to
> levels that very greatly exceed the largest of the other scientific
> projects (let alone prize awards).
Probably most people don't understand the connection between
compression and AI.
>
> The blind person comparison is very good pedagogy. What is the source
> of that comparison? If it's you, congratulations -- you are convincing
> me I should become a harrier for the BDNF.
No, but the point is a language model does not need to be grounded to
solve most tasks. But an exception occurred to me. I think that
grounding is required to write a "joke detector", a program that inputs
a joke (as text), and outputs whether or not it is funny. To my
knowledge, no such program has ever been written. The problem is that
what people find funny is the process of grounding or associating text
for the first time to knowledge they have already learned nonverbally.
This is why a joke is not funny the second time you hear it.
Here is an example that makes use of the technique several times.
http://www.comics.com/comics/frazz/...z-20061112.html
-- Matt Mahoney
| |
| Rob Freeman 2006-12-17, 6:55 pm |
| Matt Mahoney wrote:
> ...
> I know most compression algorithms use simple statistics to compress.
> My goal is to change that. Humans need to use vast knowledge to
> predict the next bits in strings like "the cat caught a mo_". If you
> figure out how to build a language model that can learn that kind of
> knowledge from a huge corpus and then apply it to prediction, then your
> model plus an arithmetic coder will be a very powerful compressor. The
> same model could be used to solve many other hard problems, like speech
> recognition that works as well as the human ear (language model +
> acoustic model). Right now speech recognition is pretty useless even
> when it works because there is no intelligence behind it ("press 1 or
> say 'yes'"). It is the reason we still use keyboards.
Matt,
It depresses me to hear you continue to get the problem so completely
back-to-front like this.
This conception of the problem is the reason we still fail. We'll
continue to fail as long as we continue to conceive the problem this
way.
It is not because we don't have a knowledge (AI) that we can't do
speech recognition. We already have all the knowledge we need. Text is
a great representation for knowledge. The problem is we don't interpret
text the right way.
You've got the problem back-to-front. We don't need knowledge so we can
analyze text, we need to analyze text right so we can decode knowledge.
To be precise, we need to understand there is no single best way to
order/compress text. No sufficient global language model. Give up the
preconception of a sufficient global language model and it will be
obvious how to get the knowledge relevant to each bit of text, and
recognize speech, every time.
We already know how to "learn" knowledge from text. We just need to
understand "learning" knowledge/grammar must always be ad-hoc/partial.
I've been saying this for a while here and there. But there may be
others out looking for answers who need to know your analysis of the
problem is not the only one.
-Rob
| |
| Matt Mahoney 2006-12-17, 9:56 pm |
|
Rob Freeman wrote:
> Matt Mahoney wrote:
>
> Matt,
>
> It depresses me to hear you continue to get the problem so completely
> back-to-front like this.
>
> This conception of the problem is the reason we still fail. We'll
> continue to fail as long as we continue to conceive the problem this
> way.
>
> It is not because we don't have a knowledge (AI) that we can't do
> speech recognition. We already have all the knowledge we need. Text is
> a great representation for knowledge. The problem is we don't interpret
> text the right way.
>
> You've got the problem back-to-front. We don't need knowledge so we can
> analyze text, we need to analyze text right so we can decode knowledge.
>
> To be precise, we need to understand there is no single best way to
> order/compress text. No sufficient global language model. Give up the
> preconception of a sufficient global language model and it will be
> obvious how to get the knowledge relevant to each bit of text, and
> recognize speech, every time.
>
> We already know how to "learn" knowledge from text. We just need to
> understand "learning" knowledge/grammar must always be ad-hoc/partial.
>
> I've been saying this for a while here and there. But there may be
> others out looking for answers who need to know your analysis of the
> problem is not the only one.
>
> -Rob
I think I understand that the langauge model has to be learned from the
same text you are compressing. I chose the size of the large text
benchmark with this in mind. Did I miss something?
-- Matt Mahoney
| |
| Rob Freeman 2006-12-18, 7:56 am |
| Matt Mahoney wrote:
> Rob Freeman wrote:
>
> I think I understand that the langauge model has to be learned from the
> same text you are compressing. I chose the size of the large text
> benchmark with this in mind. Did I miss something?
You seem to be missing the idea that the same text can contain
contradictory patterns (spread across the whole text.)
-Rob
| |
| Matt Mahoney 2006-12-19, 3:56 am |
|
Rob Freeman wrote:
> Matt Mahoney wrote:
>
> You seem to be missing the idea that the same text can contain
> contradictory patterns (spread across the whole text.)
>
> -Rob
So what if it does? Why can't this be modeled?
-- Matt Mahoney
| |
| Rob Freeman 2006-12-19, 3:56 am |
| Matt Mahoney wrote:
> Rob Freeman wrote:
[color=darkred]
> So what if it does? Why can't this be modeled?
You can model it. You just can't model it all ways at once.
That is the insight we need to get speech recognition, not a model for
knowledge. We already have a model for knowledge, which we can "learn"
from text at will. The crucial missing piece is the insight this
knowledge must always be partial, that we must keep the raw text and
extract from it the knowledge relevant to each problem, as that problem
arises.
-Rob
| |
| Matt Mahoney 2006-12-19, 6:56 pm |
|
Rob Freeman wrote:
> Matt Mahoney wrote:
>
>
> You can model it. You just can't model it all ways at once.
>
> That is the insight we need to get speech recognition, not a model for
> knowledge. We already have a model for knowledge, which we can "learn"
> from text at will. The crucial missing piece is the insight this
> knowledge must always be partial, that we must keep the raw text and
> extract from it the knowledge relevant to each problem, as that problem
> arises.
This is normal in text compression. Given a sequence "AB...AC...A_", a
compressor will predict both B and C with some probability. Language
models work this way, except the contexts can be more complex.
-- Matt Mahoney
| |
| Rob Freeman 2006-12-20, 3:56 am |
| Matt,
Gotta rush to catch a plane, but just quickly...
Matt Mahoney wrote:
> Rob Freeman wrote:
>
> This is normal in text compression. Given a sequence "AB...AC...A_", a
> compressor will predict both B and C with some probability. Language
> models work this way, except the contexts can be more complex.
Is it randomness you are saying is normal in text compression?
Anyway, if we only had to predict observed sequences life would be easy
(dull, but easy.)
Make your example a bit richer.
What about "AX...DX...DB...AZ...YZ...YC...A_"?
Think about this particularly in the context of AI (grammar,
whatever...)
-Rob
| |
| Matt Mahoney 2006-12-20, 6:55 pm |
|
Rob Freeman wrote:
> Matt,
>
> Gotta rush to catch a plane, but just quickly...
>
> Matt Mahoney wrote:
>
> Is it randomness you are saying is normal in text compression?
>
> Anyway, if we only had to predict observed sequences life would be easy
> (dull, but easy.)
>
> Make your example a bit richer.
>
> What about "AX...DX...DB...AZ...YZ...YC...A_"?
This is the context modeling problem faced by PPM and CM type
algorithms. In the order 1 context "A" we have "X" and "Z". In the
order 0 context we have X, X, B, Z, Z, C. The problem is how to
combine these to form a probability distribution. Generally we give
the higher order contexts greater weight. Older PPM algorithms do this
according to a formula that depends on the number of occurrences of the
higher order context and the number of different values that appear in
this context. Newer variants such as PPMZ (used in ppmd and ppmonstr)
do this adaptively instead of by a fixed formula. Context mixing
algorithms such as PAQ6 compute a probability distribution for each
order and combine them by weighted averaging and adapt the weights to
improve compression. PAQ7/8 combines predictions using a neural
network, which is equivalent to first transforming the probabilities
into the logistic domain, log(p/(1-p)), followed by adaptive weighted
averaging and then an inverse transform, 1/(1+e^p). Here, p is the
probability of the next bit (not the next byte).
> Think about this particularly in the context of AI (grammar,
> whatever...)
PAQ allows extending the context to be arbitrary features of the input
history, such as whole words, groups of words, or syntactic or semantic
contexts (set of words, sparse contexts). This is already being done
for the Hutter prize.
-- Matt Mahoney
| |
| markn@ieee.org 2006-12-21, 9:56 pm |
| > This is the context modeling problem faced by PPM and CM type
> algorithms. In the order 1 context "A" we have "X" and "Z". In the
> order 0 context we have X, X, B, Z, Z, C. The problem is how to
> combine these to form a probability distribution.
Long, long ago when working on my first PPM program (around the time of
the DDJ programming contest) I spent a lot of time experimenting and
puzzling over this. At the time I couldn't find too much in the way of
published results and never achieved any blinding insight on my own.
But even today the solutions seem a bit inelegant.
Presumably the human brain does something very much like this when
listening to speech. We hear a given word and choose a high-probability
candidate out of the spectrum. Most of the time we are able to squeeze
the probability of the highest candidate so well that we have little or
no doubt that we have heard the correct word. With printed text, we
eliminate the problem of speech recognition and do even better.
It would seem that the human brain does this using semantics, right?
(Back to Hutter-land.) So Matt, do you think any of the more recent
modeling techniques you described come close to actually using
semantics in their model? Or is it bad thinking to view "semantics" as
somehow different from "probability based models?" Is it hubris to
think that there is somehow something different about the way we
interpret a data stream, and that the word "understanding" is not as
special as we think it is?
|
| Mark Nelson - http://marknelson.us
|
| |
| Matt Mahoney 2006-12-21, 9:56 pm |
|
markn@ieee.org wrote:
>
> Long, long ago when working on my first PPM program (around the time of
> the DDJ programming contest) I spent a lot of time experimenting and
> puzzling over this. At the time I couldn't find too much in the way of
> published results and never achieved any blinding insight on my own.
> But even today the solutions seem a bit inelegant.
I like the PPMZ approach of measuring the zero frequency probability in
various contexts. Older variants such as PPMC are based on the
Good-Turing hypothesis, which says essentially that the probability of
a future novel event can be estimated from the past rate of novel
events. So if a context is observed 50 times with 5 different
outcomes, then the probability that the next outcome will not be one of
those 5 is 5/50. This is harder to do when the numbers are small. In
that case, PPMC uses an ad-hoc solution, and PPMZ measures.
> Presumably the human brain does something very much like this when
> listening to speech. We hear a given word and choose a high-probability
> candidate out of the spectrum. Most of the time we are able to squeeze
> the probability of the highest candidate so well that we have little or
> no doubt that we have heard the correct word. With printed text, we
> eliminate the problem of speech recognition and do even better.
The human brain is very good at combining many sources of evidence.
This was part of my reasoning in using neural networks in PAQ7/8 to
combine the outputs of different models.
> It would seem that the human brain does this using semantics, right?
> (Back to Hutter-land.) So Matt, do you think any of the more recent
> modeling techniques you described come close to actually using
> semantics in their model? Or is it bad thinking to view "semantics" as
> somehow different from "probability based models?" Is it hubris to
> think that there is somehow something different about the way we
> interpret a data stream, and that the word "understanding" is not as
> special as we think it is?
Semantics is just a statistical property of text. When we think of
words like "rain" and "snow" being semantically related, it really
means that these words are likely to appear near each other. So a
compressor seeing one word, can predict the other.
Likewise, syntax is a statistical property. We classify words as
nouns, verbs, etc., but what this really means is that words in the
same class are likely to appear in the same context. We can then use a
words's class, in addition to the word itself, as context. Thus, given
"a X was", you know that X must be a noun, so you can predict sequences
like "the X is".
The top compressors like paq8hp7 and durilca4linux model syntax and
semantics by grouping related words in their dictionaries so that words
in the same class have the same high order bits in their dictionary
codes. Then the high order bits can be predicted independently and
also used as context. In paq8hp7 the dictionary ordering was done
manually. In durilca4linux, I believe the words were clustered based
on context similarity. I have been doing experiments in this area to
construct similar dictionaries.
But I will probably not use this approach. I think there are better
ways to model language. Syntax and semantics cannot be mapped cleanly
to a linear word order. Also, the programs currently cannot model
higher level grammatical structures such as phrases and sentences.
-- Matt Mahoney
| |
| Rob Freeman 2006-12-21, 9:56 pm |
| Matt Mahoney wrote:
> ...
> The top compressors like paq8hp7 and durilca4linux model syntax and
> semantics by grouping related words ...
> ...In durilca4linux, I believe the words were clustered based
> on context similarity. I have been doing experiments in this area to
> construct similar dictionaries.
Yes, this is what I was thinking of. What kind of distribution do you
get if you cluster symbols based on context similarity in the example I
gave?
-Rob
| |
| Matt Mahoney 2006-12-22, 6:56 pm |
| Rob Freeman wrote:
> Matt Mahoney wrote:
>
> Yes, this is what I was thinking of. What kind of distribution do you
> get if you cluster symbols based on context similarity in the example I
> gave?
I don't know about your example, but here is a snip of the words near
"mother" in paq8hp7 using manual organization of the dictionary. You
can extract the whole dictionary (as a temporary file) by running the
program.
male
women
men
females
males
people
children
daughter
son
sister
brother
mother
father
minus
plus
outside
inside
once
twice
few
many
much
extra
more
Here is a similar snip from durilca4linux_2, which is based on grouping
by similar context in enwik9. You will have to ask Dmitry Shkarin for
the exact details.
patron
son
grandson
own
solo
friendship
flagship
relationship
elder
shoulder
career
younger
teacher
father
grandfather
mother
brother
partner
daughter
sister
lover
hair
heir
successor
predecessor
hands
friends
minds
experiences
speeches
I did some experiments in which I assigned a 128 element context vector
to the 32K most frequent words in enwik8 by hashing the surrounding
words, weighted inversely by distance, then normalizing the vectors to
unit length and performing bottom up binary clustering. I also did
stemming on low frequency words. Here is a snip:
sacrificial
greedy
Menace
aegis
Bazaar
Temptations
anthropic
holographic
slippery
Roaring
father
mother
brother
uncle
wife
cousin
grandfather
grandmother
parents
pupils
contemporaries
colleagues
companions
professors
But as I said, this is not the best approach. The syntax and semantics
of a word is better expressed as a vector in hundreds of dimensions.
When you impose a linear ordering, as in a dictionary, a lot of this
information is lost.
-- Matt Mahoney
| |
| bybell@rocketmail.com 2006-12-23, 4:05 am |
| Matt Mahoney wrote:
> But I will probably not use this approach. I think there are better
> ways to model language. Syntax and semantics cannot be mapped cleanly
> to a linear word order. Also, the programs currently cannot model
> higher level grammatical structures such as phrases and sentences.
>
> -- Matt Mahoney
Have you ever experimented with Link Grammar? (
http://www.link.cs.cmu.edu/link )
linkparser> I will probably not use this approach.
++++Time 0.00 seconds (0.00
total)
Found 1 linkage (1 had no P.P. violations)
Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=11)
+--------------------------Xp-------------------------+
| +---------I--------+ |
| +------N------+ +-------Os------+ |
+--Wd--+-Sp*i+ +--E--+ | +---Dsu--+ |
| | | | | | | | |
LEFT-WALL I.p will.v probably not use.v this.d approach.n .
....something of the sort probably could be used to drive a predictor in
a way a bit closer to human language processing than simply ordering
words in a list.
-Tony
| |
| markn@ieee.org 2006-12-23, 6:55 pm |
| Matt Mahoney wrote:
> Semantics is just a statistical property of text. When we think of
> words like "rain" and "snow" being semantically related, it really
> means that these words are likely to appear near each other. So a
> compressor seeing one word, can predict the other.
> -- Matt Mahoney
I have problems with this, and while they may be rooted in ignorance,
it just sounds very perceptron-ish.
Let's say I'm in a colleagues office in one of two alterante realities.
I point to a picture on his wll and say "what's that?"
In scenario one, it's a decorative photo taken from a color enhanced
electron microscope capture. He says "a rock".
In the second, it's a google earth satellite image of some empty
landscape. He says "Iraq"
Both words are pronounced exactly the same, and I feel that what is
used to differentiate them is old-school Minsky AI semantics, where
I've built some sort of cognitive web.
I think you have to stretch too far to create a statistical model to
connect the words otherwise.
To make it fair, instead have the two sentences say:
"What's the pretty colored abstract looking photo from an electron
microscope?" "A Rock"
"What's that satellite photo?" "Iraq"
In the first case, using semantics, I'm able to disambiguate the word
without any previous statistical correlation. I'm able to jump from the
concept of an electron microscope to think about things that might be
studied in a scientific context and know that "a rock" is a pretty
reasonable answer. Even without previous statistics.
Or how about something like "My dog has shngles" (disambiguating
between the building supply and the disease. You'd have zero
statistical help on this, but semantically you'd make the unlikely jump
that dogs have diseases don't do construction.
Maybe this is all old news to everyone else and I'm in totally annoying
see-the-FAQ territory. I'm sure I would be in a different NG, but I'm
hoping that since you're trying to drag compression into some new areas
you can help with some first principles.
So anyway, to sum up, it seems to me that semantics involve a lot more
than statistics.
|
| Mark Nelson - http://marknelson.us
|
| |
| Matt Mahoney 2006-12-23, 6:55 pm |
|
bybell@rocketmail.com wrote:
> Matt Mahoney wrote:
>
>
> Have you ever experimented with Link Grammar? (
> http://www.link.cs.cmu.edu/link )
>
> linkparser> I will probably not use this approach.
> ++++Time 0.00 seconds (0.00
> total)
> Found 1 linkage (1 had no P.P. violations)
> Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=11)
>
> +--------------------------Xp-------------------------+
> | +---------I--------+ |
> | +------N------+ +-------Os------+ |
> +--Wd--+-Sp*i+ +--E--+ | +---Dsu--+ |
> | | | | | | | | |
> LEFT-WALL I.p will.v probably not use.v this.d approach.n .
>
> ...something of the sort probably could be used to drive a predictor in
> a way a bit closer to human language processing than simply ordering
> words in a list.
>
> -Tony
One problem with parsing natural language is you need to analyze
semantics first to do it right.
+------------------Xp-----------------+
| +---Js--+ |
+--Wd--+Sp*i+-----MVp-----+ +-Ds-+ |
| | | | | | |
LEFT-WALL I.p ate.v [pizza] with a fork.n .
Constituent tree:
(S (NP I)
(VP ate pizza
(PP with
(NP a fork)))
.)
+---------------------Xp--------------------+
+--Wd--+Sp*i+-----MVp-----+----Jp---+ |
| | | | | |
LEFT-WALL I.p ate.v [pizza] with pepperoni[?].n .
Constituent tree:
(S (NP I)
(VP ate pizza
(PP with
(NP pepperoni)))
.)
+-------------------Xp------------------+
| +---Js---+ |
+--Wd--+Sp*i+-----MVp-----+ +--Ds-+ |
| | | | | | |
LEFT-WALL I.p ate.v [pizza] with a friend.n .
Constituent tree:
(S (NP I)
(VP ate pizza
(PP with
(NP a friend)))
.)
-- Matt Mahoney
| |
| Matt Mahoney 2006-12-23, 6:55 pm |
| markn@ieee.org wrote:
> So anyway, to sum up, it seems to me that semantics involve a lot more
> than statistics.
For humans, it does. It involves grounding, or associating words with
nonverbally acquired knowledge such as sensations, actions, and
emotions. You know that the sky is blue because you have seen it. But
for a language model, words are only associated with other words.
Google knows the sky is blue because "blue sky" returns more hits than
if you substitute other colors.
My point is that if you interact with an AI through a text-only
interface, it is hard to tell the difference between these two types of
knowledge. Likewise for text prediction.
-- Matt Mahoney
| |
| bybell@rocketmail.com 2006-12-24, 6:56 pm |
| Matt Mahoney wrote:
> One problem with parsing natural language is you need to analyze
> semantics first to do it right.
Yes, to do it right as in if you're trying to derive some sort of
meaning from the utterance. Without extra knowledge, the second part
of "time flies like an arrow; fruit flies like a banana" is ambiguous.
But is it really necessary for a compressor to understand fully the
text going through it? (though yes, that would start going down the AI
route)
The path I'm thinking is that you're parsing the language merely to
derive the underlying English structure so you can more accurately
guess the probability of the next encountered part of speech or
whatever. That is, given that you know the enwik text is English, the
ordering of the words isn't random and more-or-less fits the recursive
structure of English so maybe that fact can be exploited as various
clauses and phrases will have opening and closing points. *shrugs*
What kind of percentages do the glue words (like articles,
prepositions, etc) used to maintain proper grammar have in enwik? Do
they create much noise?
Anyway, given a large enough corpus of text though, possibly grammar
modeling is unnecessary and would be a hinderance. I do like the idea
of grouping together related terms as humans think in an associative
manner. Has the generation of those word lists been automated yet?
-Tony
| |
| Matt Mahoney 2006-12-24, 9:56 pm |
|
bybell@rocketmail.com wrote:
> Matt Mahoney wrote:
>
>
> Yes, to do it right as in if you're trying to derive some sort of
> meaning from the utterance. Without extra knowledge, the second part
> of "time flies like an arrow; fruit flies like a banana" is ambiguous.
> But is it really necessary for a compressor to understand fully the
> text going through it? (though yes, that would start going down the AI
> route)
It depends what you mean by "understand". Suppose a compressor
predicting the next byte in "the cat caught a mo_" assigns the highest
probability to the same byte that you would guess first. Suppose it
does this consistently for many examples, so well that you can't
distinguish its guesses from that of a human. Would you say the
program "understands" its input?
> The path I'm thinking is that you're parsing the language merely to
> derive the underlying English structure so you can more accurately
> guess the probability of the next encountered part of speech or
> whatever. That is, given that you know the enwik text is English, the
> ordering of the words isn't random and more-or-less fits the recursive
> structure of English so maybe that fact can be exploited as various
> clauses and phrases will have opening and closing points. *shrugs*
> What kind of percentages do the glue words (like articles,
> prepositions, etc) used to maintain proper grammar have in enwik? Do
> they create much noise?
Typically half of text consists of the 100-200 most frequent words. I
would not call this noise. Rather, it makes compression easier.
Of course recognizing natural language grammar should lead to better
compression. It will allow you to make only those predictions that
lead to grammatically correct sentences. But I am not convinced that
recognizing grammar is the same as parsing. There are lots of people
who can form and recognize grammatically correct sentences but who have
never learned the difference between a noun and a verb.
> Anyway, given a large enough corpus of text though, possibly grammar
> modeling is unnecessary and would be a hinderance. I do like the idea
> of grouping together related terms as humans think in an associative
> manner. Has the generation of those word lists been automated yet?
>
> -Tony
I think just the opposite. As the corpus size grows, it will be
increasingly important to recoginize correct grammar in longer
sentences. In a small corpus, there is not enough data to learn a
grammar. (Natural language grammar is so complex that I think this is
the only viable approach, as opposed to hand coding the rules. Even if
you could do it, it would add tens of MB to the decompressor). The
Hutter prize uses 100 MB of text, which is about the amount of language
that is exposed to a 3 year old child. I would not expect a language
model to learn adult level grammar with this much data. If you have 1
GB of text, then I would expect it.
As I mentioned in another post, durilca4linux_2 uses a dictionary where
the words were automatically grouped, but it was done manually in
paq8hp7. I know how to automatically construct such dictionaries, but
this is not the best approach. I think a better model would be one
which learns the relations between words, expressed internally as
vectors in hundreds of dimensions. I think the reason dictionaries
work is that they "pre-compress" the input to save time and memory,
which means more models can be added within the hardware limits.
-- Matt Mahoney
| |
| Rob Freeman 2007-01-06, 6:56 pm |
| Hi Matt,
Happy NY.
Look, Matt, you've got so much right, but you are just missing the
point.
Matt Mahoney wrote:
> ...
> As I mentioned in another post, durilca4linux_2 uses a dictionary where
> the words were automatically grouped, but it was done manually in
> paq8hp7. I know how to automatically construct such dictionaries, but
> this is not the best approach. I think a better model would be one
> which learns the relations between words, expressed internally as
> vectors in hundreds of dimensions.
Take a look at my "text" example again:
"AX...DX...DB...AZ...YZ...YC...A_"
I'm saying that a representation of meaning with the power you are
looking for is right there in the text.
Make "dictionary" lists for my example. As you know the algorithm is to
cluster contexts. In my example A and D have contexts in common (_X),
as do A and Y (_Z).
Two "dictionary" lists are possible for my example (A, D) and (A, Y).
That is what I hoped you would find. And just like "dictionaries"
derived manually (e.g. your lists for paq8hp7 etc.) they are
contradictory (if A is similar to D that predicts AB, and if A is
similar to Y that predicts AC.)
The point is exactly that yes, you have all these different
"dictionary" orders possible. But this is not a flaw in the method of
automatically deriving dictionaries by clustering contexts. Look at it
the other way round. It is a power of text to represent many
(contradictory) dictionaries.
The representation you are looking for, the "vector" representation for
meaning below the dictionary level, the one which will represent the
meaning of all these contradictory dictionaries in an objective way...
is text itself. But you need to keep the text, not a dictionary (which
must be ad-hoc because of all the different orders possible) or a
language model (which will be random because of all the different
orders possible.)
I don't know if I'm making myself clear yet. I fear not.
The representation of meaning you are looking for is text. It is
pointless to think of using it to compress text. If you do you will
only get a compromise solution, an averaging of meaning, not a
representation for meaning. Instead think of text itself as a very
compact representation for meaning (by virtue of all the contradictory
dictionary orders you can squeeze out of it.)
-Rob
| |
| Rob Freeman 2007-01-06, 6:56 pm |
| Hi Mark,
You are right to "have problems with this". It rests on the nature of
"meaning". That is a deep question. There are no generally accepted
answers I'm aware of.
Personally I've been convinced for a while that the best working
definition for "meaning" is: "an organization of information". As such
I don't have a problem interpreting statistics in terms of "meaning".
It follows from my definition.
I've come across some support for this definition. (Not least what I
understand of Marcus Hutter's idea that compression is enough to
predict "intelligent" behavior.)
Here's something else I came across recently which strikes me as an
expression of basically the same ideas (in Thomas Kuhn, The Structure
of Scientific Revolutions, p.g. 44-45, citing Ludwig Wittgenstein,
Philosophical Investigations, trans. G. E. M. Anscombe, pp 31-36):
<<<
What need we know, Wittgenstein asked, in order that we apply terms
like 'chair', or 'leaf', or 'game' unequivocally and without provoking
argument?
That question is very old and has generally been answered by saying
that we must know, consciously or intuitively, what a chair, or a leaf,
or game _is_. We must, that is, grasp some set of attributes that all
games and only games have in common. Wittgenstein, however, concluded
that, given the way we use language and the sort of world to which we
apply it, there need be no such set of characteristics. Though a
discussion of _some_ of the attributes shared by a _number_ of games
or chairs or leaves often helps us learn how to employ the
corresponding term, there is no set of characteristics that is
simultaneously applicable to all members of the class and to them
alone. Instead, confronted with a previously unobserved activity, we
apply the term 'game' because what we are seeing bears a close "family
resemblance" to a number of the activities that we have previously
learned to call by that name. For Wittgenstein, in short, games, and
chairs, and leaves are natural families, each constituted by a network
of overlapping and crisscross resemblances. The existence of such a
network sufficiently accounts for our success in identifying the
corresponding object or activity.
<<<
Notice how Wittgenstein's definition of meaning is fundamentally a set.
Note also that by his definition there is no single sufficient set,
only many, mutually contradictory sets.
If you're interested in exploring such "set theoretic" ideas about
meaning I strongly recommend the whole of Kuhn's book. Of course I knew
Kuhn was famous for proposing scientific progress/knowledge was
discontinuous (and partially subjective), but I never realized how much
he had to say about the nature of knowledge itself. In fact he defines
knowledge fundamentally as sets of examples. This equivalence between
sets of examples, the original sense of "paradigm", and knowledge, is
where his famous use of the word "paradigm" in the sense of "word view"
or "scientific theory" comes from.
Note also that (I would argue) attempts to base mathematics in set
theory 100 or so years ago were not unrelated to an interpretation of
"meaning" in terms of sets.
I can give you other refs. but it might be more than would interest a
compression news group.
-Rob
markn@ieee.org wrote:
> Matt Mahoney wrote:
>
>
> I have problems with this, and while they may be rooted in ignorance,
> it just sounds very perceptron-ish.
>
> Let's say I'm in a colleagues office in one of two alterante realities.
> I point to a picture on his wll and say "what's that?"
>
> In scenario one, it's a decorative photo taken from a color enhanced
> electron microscope capture. He says "a rock".
>
> In the second, it's a google earth satellite image of some empty
> landscape. He says "Iraq"
>
> Both words are pronounced exactly the same, and I feel that what is
> used to differentiate them is old-school Minsky AI semantics, where
> I've built some sort of cognitive web.
>
> I think you have to stretch too far to create a statistical model to
> connect the words otherwise.
>
> To make it fair, instead have the two sentences say:
>
> "What's the pretty colored abstract looking photo from an electron
> microscope?" "A Rock"
>
> "What's that satellite photo?" "Iraq"
>
> In the first case, using semantics, I'm able to disambiguate the word
> without any previous statistical correlation. I'm able to jump from the
> concept of an electron microscope to think about things that might be
> studied in a scientific context and know that "a rock" is a pretty
> reasonable answer. Even without previous statistics.
>
> Or how about something like "My dog has shngles" (disambiguating
> between the building supply and the disease. You'd have zero
> statistical help on this, but semantically you'd make the unlikely jump
> that dogs have diseases don't do construction.
>
> Maybe this is all old news to everyone else and I'm in totally annoying
> see-the-FAQ territory. I'm sure I would be in a different NG, but I'm
> hoping that since you're trying to drag compression into some new areas
> you can help with some first principles.
>
> So anyway, to sum up, it seems to me that semantics involve a lot more
> than statistics.
>
> |
> | Mark Nelson - http://marknelson.us
> |
| |
| Matt Mahoney 2007-01-06, 6:56 pm |
|
Rob Freeman wrote:
> Hi Matt,
>
> Happy NY.
>
> Look, Matt, you've got so much right, but you are just missing the
> point.
>
> Matt Mahoney wrote:
>
> Take a look at my "text" example again:
>
> "AX...DX...DB...AZ...YZ...YC...A_"
>
> I'm saying that a representation of meaning with the power you are
> looking for is right there in the text.
>
> Make "dictionary" lists for my example. As you know the algorithm is to
> cluster contexts. In my example A and D have contexts in common (_X),
> as do A and Y (_Z).
>
> Two "dictionary" lists are possible for my example (A, D) and (A, Y).
> That is what I hoped you would find. And just like "dictionaries"
> derived manually (e.g. your lists for paq8hp7 etc.) they are
> contradictory (if A is similar to D that predicts AB, and if A is
> similar to Y that predicts AC.)
>
> The point is exactly that yes, you have all these different
> "dictionary" orders possible. But this is not a flaw in the method of
> automatically deriving dictionaries by clustering contexts. Look at it
> the other way round. It is a power of text to represent many
> (contradictory) dictionaries.
>
> The representation you are looking for, the "vector" representation for
> meaning below the dictionary level, the one which will represent the
> meaning of all these contradictory dictionaries in an objective way...
> is text itself. But you need to keep the text, not a dictionary (which
> must be ad-hoc because of all the different orders possible) or a
> language model (which will be random because of all the different
> orders possible.)
>
> I don't know if I'm making myself clear yet. I fear not.
>
> The representation of meaning you are looking for is text. It is
> pointless to think of using it to compress text. If you do you will
> only get a compromise solution, an averaging of meaning, not a
> representation for meaning. Instead think of text itself as a very
> compact representation for meaning (by virtue of all the contradictory
> dictionary orders you can squeeze out of it.)
>
> -Rob
I understand your text model now. If A shares contexts with both D and
Y, then the model should predict whatever follows D and Y. Why is this
not compatible with a vector representation where A is close to both D
and Y? This model would predict X, B, Z, and C.
-- Matt Mahoney
| |
| Rob Freeman 2007-01-06, 6:56 pm |
| Matt Mahoney wrote:
> ...
> I understand your text model now. If A shares contexts with both D and
> Y, then the model should predict whatever follows D and Y. Why is this
> not compatible with a vector representation where A is close to both D
> and Y?
It is compatible with such a representation. But notice that
abstracting in this way means you throw away (semantic?) information
which can distinguish between AB and AC in practice (just not
statistically, the statistics are random.)
For a complete representation you want to know not only that A is close
to both D and Y. You also want to know that in some ways A is close to
D but not Y, and in other ways A is close to Y but not D. (To contrast
my simple example with one where D and Y are similar to A in exactly
the same ways, say: AX...DX...DB...YX...YC...AZ...A_, which also
predicts X, B, Z, and C.)
This information is not relevant for compression, because AB and AC are
equally likely (random/incompressible) in both cases, but it is
relevant for the representation of "meaning" (if "meaning" is equated
with clustered contexts.) And by distinctions within that
representation for meaning, clustering A with D or A with Y, this
(semantic?) information both selects, and is selected by, the
occurrence of AB or AC in sequence.
But note, the distinctions which select really come from the text. It
is not a case that some abstract representation for meaning selects
between two cases, and that text itself only expresses some
impoverished language model. Both our (random) language models and the
"meaning" which selects over them are products of the text itself.
Different ways of clustering text produce meaning, _and_ distinguish
different syntactic sequences.
To escape from the (apparent) poverty of random language models we
don't need a separate model of meaning. We just need to understand that
the model of knowledge/grammar we get by clustering text must always be
ad-hoc/partial.
-Rob
| |
| Matt Mahoney 2007-01-06, 6:56 pm |
|
Rob Freeman wrote:
> Matt Mahoney wrote:
>
> It is compatible with such a representation. But notice that
> abstracting in this way means you throw away (semantic?) information
> which can distinguish between AB and AC in practice (just not
> statistically, the statistics are random.)
>
> For a complete representation you want to know not only that A is close
> to both D and Y. You also want to know that in some ways A is close to
> D but not Y, and in other ways A is close to Y but not D. (To contrast
> my simple example with one where D and Y are similar to A in exactly
> the same ways, say: AX...DX...DB...YX...YC...AZ...A_, which also
> predicts X, B, Z, and C.)
>
> This information is not relevant for compression, because AB and AC are
> equally likely (random/incompressible) in both cases, but it is
> relevant for the representation of "meaning" (if "meaning" is equated
> with clustered contexts.) And by distinctions within that
> representation for meaning, clustering A with D or A with Y, this
> (semantic?) information both selects, and is selected by, the
> occurrence of AB or AC in sequence.
>
> But note, the distinctions which select really come from the text. It
> is not a case that some abstract representation for meaning selects
> between two cases, and that text itself only expresses some
> impoverished language model. Both our (random) language models and the
> "meaning" which selects over them are products of the text itself.
> Different ways of clustering text produce meaning, _and_ distinguish
> different syntactic sequences.
>
> To escape from the (apparent) poverty of random language models we
> don't need a separate model of meaning. We just need to understand that
> the model of knowledge/grammar we get by clustering text must always be
> ad-hoc/partial.
>
> -Rob
I think it is possible to represent both the grammatical role and the
semantics of a word using a single vector. In half of the dimensions,
words are close if they appear in similar immediate contexts (syntax).
In the other half, words are close if they appear in similar long range
contexts (semantics). A model will predict words by combining the
syntactic and semantic predictions.
Of course this is not a complete language model. You also need to
represent high level grammatical structures such as phrases and
sentences.
-- Matt Mahoney
| |
| Sportman 2007-01-06, 6:56 pm |
| markn@ieee.org wrote:
> Matt Mahoney wrote:
>
>
> I have problems with this, and while they may be rooted in ignorance,
> it just sounds very perceptron-ish.
> So anyway, to sum up, it seems to me that semantics involve a lot more
> than statistics.
I think all there is are waves, AI/meaning is the binding or unbinding
of waves in other words AI is counting. Every time a letter (waves),
word (waves), some words (waves), sentence (waves), part of story
(waves), story (waves) or whatever data (waves) pass by, it bind or
unbind waves=counting.
A simplified example: after passing by random information (waves) from
after the year 1955 the counting of the word (waves) Einstein with
other single words (waves) can look like this (highest count first)
Einstein and science
Einstein and physics
Einstein and albert
Einstein and theory
Einstein and America
Einstein and iq
Einstein and relativity
Einstein and atom
Einstein and 1955
Einstein and 1945
Einstein and German
Einstein and genius
Einstein and 1939
Einstein and letter
Einstein and Nobel
Einstein and Manhattan
Einstein and 1905
Einstein and warning
Einstein and atomic
Einstein and bomb
Einstein and 1879
Einstein and patent
Einstein and prize
Einstein and Swiss
Einstein and Bose
Einstein and mc
Einstein and physicist
Einstein and Eisenhower
Einstein and mc2
Einstein and e=mc2
Every wave receiver (for example brain) shall bind or unbind (count)
waves different depending of location, time and tuning (wave filter).
The question is if computer software can become comparable intelligent
with a human being because is it only the written (waves) or spoken
words (waves) or the other three senses (taste, smell and feel) signals
(waves) what cause a reaction (waves). The written or spoken words and
the other three senses signals can be well a part of the original
thought wave the not "visible" wave, the wave, the brain (tunable
wave receiver) directly (telepathic) receive from the surrounding
waves, once changed by the original brain (tunable wave transmitter)
thought. In this case there are more thought waves out there then the
five human sense waves and what happen with thought waves what are
transmitted, are they for ever to filter out later or does they
disappear over time.
Persons, families, friends, villages, cities, states, cultures,
countries, continents, planets, star systems, galaxies and universes
etc. every wave system has their local wave characteristics (storage)
what influence the surrounding.
Prediction is counting and comparing like a balance.
| |
| Rob Freeman 2007-01-06, 6:56 pm |
| Matt Mahoney wrote:
>
> I think it is possible to represent both the grammatical role and the
> semantics of a word using a single vector. In half of the dimensions,
> words are close if they appear in similar immediate contexts (syntax).
> In the other half, words are close if they appear in similar long range
> contexts (semantics). A model will predict words by combining the
> syntactic and semantic predictions.
What do you mean by "immediate" and "long range" contexts?
In the example "AX...DX...DB...AZ...YZ...YC...A_" my relevant distance
vectors (both defining meaning, and distinguishing syntax) are (A,D)
and (A,Y). Vector (A,D) distinguishes the meaning that "in some ways A
is close to D but not Y" (associated with syntax AB), and vector (A,Y)
distinguishes the meaning that "in other ways A is close to Y but not
D" (associated with syntax AC.)
Can you list your "single vector" for this example?
-Rob
| |
| Matt Mahoney 2007-01-06, 6:56 pm |
|
Rob Freeman wrote:
> Matt Mahoney wrote:
>
> What do you mean by "immediate" and "long range" contexts?
>
> In the example "AX...DX...DB...AZ...YZ...YC...A_" my relevant distance
> vectors (both defining meaning, and distinguishing syntax) are (A,D)
> and (A,Y). Vector (A,D) distinguishes the meaning that "in some ways A
> is close to D but not Y" (associated with syntax AB), and vector (A,Y)
> distinguishes the meaning that "in other ways A is close to Y but not
> D" (associated with syntax AC.)
>
> Can you list your "single vector" for this example?
>
> -Rob
Your example uses immediate contexts to classify symbols. So given
AX...DX, A and D appear in the same immediate-right context. A and D
would be close in syntactic vector space. As another example, given
"to the store...to a store", then "the" and "a" appear in the same
immediate context.
A long range context is spread over a larger window, for example, two
words that appear in the same paragraph or document would have the same
long range context. For example, the words "syntax" and "semantics"
both appear in this post, although their immediate contexts are
sometimes different. Their semantic vectors are close together.
-- Matt Mahoney
| |
| Rob Freeman 2007-01-07, 7:55 am |
| Matt Mahoney wrote:
> Rob Freeman wrote:
>
> Your example uses immediate contexts to classify symbols...
>
> A long range context is spread over a larger window...
| | |