Home > Archive > Unix Programming > July 2006 > fork() question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Below is simple code:
#include <unistd.h>
#include <sys/types.h>
int main(){
int *p = malloc(sizeof(int));
*p = 111;
pid_t pid;
if ((pid = fork()) < 0) {
perror("fork error\n");
} else if (pid == 0) {
*p = 222;
printf("In child %d *p: %d\n", getpid(), *p);
}
if(pid > 0){
printf("In parent %d *p: %d, child=%d\n", getpid(), *p, pid);
}
else if(pid == 0){
printf("In child %d *p: %d, parent=%d\n", getpid(), *p, getppid());
}
exit(0);
}
The output is:
In child 27903 *p: 222
In child 27903 *p: 222, parent=27902
In parent 27902 *p: 111, child=27903
Do the parent and the child share the pointer p? Or the OS just create
another p for the child.
What is the memory image after fork?
The output shows that *p has different content in parent and child.
Thanks a lot.
Jack
| |
| Francesco Frigo 2006-07-18, 7:01 pm |
| Hi,
> Do the parent and the child share the pointer p? Or the OS just create
> another p for the child.
> What is the memory image after fork?
There's a mechanism called "Copy on Write".
What happens is this:
after the fork returns there are two processes (parent and child)
executing the very same instructions and shating the very same address
space, with one difference: the parent's memory pages are marked
read/write, whereas the child memory pages are read-only.
Whenever the child attempts to write on a variable in memory, a Page Fault
happens, and the operating system realizes it was a forked address space,
therefore it makes an exact copy of the page for the child and marks it
read/write.
This has got an advantage: both the parent and the child process share as
much physical pages as possible, and the actual splitting happens on the
very pages that are needed, whereas all the others can be shared (thus not
wasting physical RAM).
Hope this helps.
Sincerely,
Francesco Frigo
| |
|
| Thanks a lot.
Francesco Frigo wrote:
> Hi,
>
>
> There's a mechanism called "Copy on Write".
> What happens is this:
> after the fork returns there are two processes (parent and child)
> executing the very same instructions and shating the very same address
> space, with one difference: the parent's memory pages are marked
> read/write, whereas the child memory pages are read-only.
>
> Whenever the child attempts to write on a variable in memory, a Page Fault
> happens, and the operating system realizes it was a forked address space,
> therefore it makes an exact copy of the page for the child and marks it read/write.
~~~~~~~~~~~~~~~~~~~~~~~~~~
This means that the names of the variables are the same in parent and
child, but the two variables locate at different place in memory. Am I
right?
Another situation is: the child does not change the value of the
variable. The parent change the variable, then will the OS makes an
exact copy of the page for the child and marks it read/write?
Thanks again.
Jack
| |
| Gordon Burditt 2006-07-18, 9:58 pm |
| >#include <unistd.h>
>#include <sys/types.h>
>int main(){
>
> int *p = malloc(sizeof(int));
>
> *p = 111;
> pid_t pid;
>
> if ((pid = fork()) < 0) {
> perror("fork error\n");
> } else if (pid == 0) {
> *p = 222;
> printf("In child %d *p: %d\n", getpid(), *p);
> }
>
> if(pid > 0){
> printf("In parent %d *p: %d, child=%d\n", getpid(), *p, pid);
> }
> else if(pid == 0){
> printf("In child %d *p: %d, parent=%d\n", getpid(), *p, getppid());
> }
>
> exit(0);
>}
>
>The output is:
>In child 27903 *p: 222
>In child 27903 *p: 222, parent=27902
>In parent 27902 *p: 111, child=27903
>
>Do the parent and the child share the pointer p? Or the OS just create
>another p for the child.
When you fork(), the address space logically gets COPIED [*] so future
changes in parent or child don't affect the other one.
If you print the value of the pointer p, you'll notice it's the
same value in parent and child, but it refers to different memory.
>What is the memory image after fork?
Immediately after: two memory images, identical except for the
value of pid returned.
>The output shows that *p has different content in parent and child.
Parent and child do not share memory other than starting with
identical but separate copies of memory. Any changes after that
are individual unless, of course, they make identical changes in
both parent and child. If you want shared memory, you know where
to find it (mmap(), shm*() ).
[*] Whether or not fork() is actually implemented with copy-on-write
or various other tricks with memory mapping is irrelevant here
except for performance. Systems actually copying the whole writable
address space on fork() either have very lacking memory management
hardware or the implementation just sucks.
Gordon L. Burditt
| |
| Gordon Burditt 2006-07-18, 9:58 pm |
| >This means that the names of the variables are the same in parent and
>child, but the two variables locate at different place in memory. Am I
>right?
Different memory, yes, but the same virtual address (but in different
processes).
>Another situation is: the child does not change the value of the
>variable. The parent change the variable, then will the OS makes an
>exact copy of the page for the child and marks it read/write?
fork() logically makes a COPY at the time of fork(), whether or not
the actual mechanism is copy-on-write. If, after the fork(), one
process changes its copy, nothing happens to a different copy.
In the situation you describe, the OS makes an exact copy of the
page for the *PARENT* and marks it read/write. Consider a situation
of pages shared after a fork-a-thon where there are dozens of
processes sharing pages (say, a shell setting up a really long
pipeline).
Gordon L. Burditt
| |
| Paul Pluzhnikov 2006-07-18, 9:58 pm |
| Francesco Frigo <frigofra@NOSPAMtin.it> writes:
> There's a mechanism called "Copy on Write".
True.
> What happens is this:
> after the fork returns there are two processes (parent and child)
> executing the very same instructions and shating the very same address
> space, with one difference: the parent's memory pages are marked
> read/write, whereas the child memory pages are read-only.
False.
The memory pages are read-only in *both* the parent and the child.
> Whenever the child
or the parent
> attempts to write on a variable in memory, a Page Fault
> happens, and the operating system realizes it was a forked address space,
realizes that this is a COW page
> therefore it makes an exact copy of the page for the child
for whichever process wrote to it; decrements reference count on
the COW page; if the count is 1 (i.e. the page is no longer shared
with anobody else) marks the COW page RW as well
> and marks it read/write.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
|
|
Paul Pluzhnikov wrote:
> Francesco Frigo <frigofra@NOSPAMtin.it> writes:
>
>
> True.
>
>
> False.
> The memory pages are read-only in *both* the parent and the child.
>
>
> or the parent
>
>
> realizes that this is a COW page
>
>
> for whichever process wrote to it; decrements reference count on
> the COW page; if the count is 1 (i.e. the page is no longer shared
> with anobody else) marks the COW page RW as well
Thanks a lot.
In the following code:
#include <unistd.h>
#include <sys/types.h>
int main(){
int i;
pid_t pid = 10;
for( i = 0; (i <= 9)&&( pid > 0); i++){ //LINE1
if ((pid = fork()) < 0) {
perror("fork error\n");
} else if (pid == 0) {
printf("child %d, parent:%d pid:%d\n", getpid(), getppid(),pid);
}
}
exit(0);
}
Ten children are created. The parent and its ten children share the
same memory
pages, right? And the reference count on the COW page is 11, right?
And I found that at LINE1 without pid>0, it is disaster.
Thanks.
Jack
| |
| Barry Margolin 2006-07-19, 4:00 am |
| In article <1153253883.847426.100390@35g2000cwc.googlegroups.com>,
"Jack" <junw2000@gmail.com> wrote:
> Do the parent and the child share the pointer p? Or the OS just create
> another p for the child.
> What is the memory image after fork?
> The output shows that *p has different content in parent and child.
It sounds like you don't understand the difference between virtual
addresses and physical addresses.
Think of a virtual address as being like a room number in a building.
Two different buildings can have a room number 101, but they aren't the
same room because and the contents can be different. It's the same with
virtual addresses and processes -- two different processes can have
pointers with the same value, but they point to locations within that
process's memory, so you get different values when you dereference them.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Herbert Pophal 2006-07-19, 8:02 am |
| Jack wrote:
> Paul Pluzhnikov wrote:
>
> Thanks a lot.
> In the following code:
>
> #include <unistd.h>
> #include <sys/types.h>
> int main(){
> int i;
> pid_t pid = 10;
>
> for( i = 0; (i <= 9)&&( pid > 0); i++){ //LINE1
> if ((pid = fork()) < 0) {
> perror("fork error\n");
> } else if (pid == 0) {
> printf("child %d, parent:%d pid:%d\n", getpid(), getppid(),pid);
> }
> }
>
> exit(0);
> }
>
> Ten children are created. The parent and its ten children share the
> same memory
> pages, right? And the reference count on the COW page is 11, right?
>
> And I found that at LINE1 without pid>0, it is disaster.
No doubt about that. If the parent fails to fork, pid is <0. Without
testing pid>0 it would continue to try creating children until it had
ten. The children behave as well. In a child, pid==0, thus it continues
looping and creates children on its own if you do not the pid > 0 test
in the for statement. The disaster is just a children avalanche.
You should terminate if pid<0, and prevent your children from continuing
the loop either, probably by terminating them, instead of testing pid>0
within for().
Herbert
| |
| Paul Pluzhnikov 2006-07-19, 8:02 am |
| "Jack" <junw2000@gmail.com> writes:
> In the following code:
>
> #include <unistd.h>
> #include <sys/types.h>
> int main(){
> int i;
> pid_t pid = 10;
>
This code "looks weird":
> for( i = 0; (i <= 9)&&( pid > 0); i++){ //LINE1
Usually this is expressed as:
for(i = 0; i < 10 && 0 < pid; i++) {
[Counting 10 times from 0; avoid unnecessary parenths; also putting
the lesser value on the left of comparison makes it easier to
understand.]
> Ten children are created. The parent and its ten children share the
> same memory pages, right?
They share most of the same memory pages; but at least the page on
which the variable 'pid' resides is un-shared as soon as fork()
returns and its return value is stored into 'pid'.
> And the reference count on the COW page is 11, right?
It could be significantly higher on some pages.
For example, libc.so* is shared by most processes on the system;
and many of them do not modify some of libc data pages; causing
their reference count to approach the number of running processes.
> And I found that at LINE1 without pid>0, it is disaster.
Yes, that would cause 1023 processes to be created instead of the
11 you intended.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| Pascal Bourguignon 2006-07-19, 7:01 pm |
| Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> writes:
> This code "looks weird":
>
>
> Usually this is expressed as:
>
> for(i = 0; i < 10 && 0 < pid; i++) {
>
> [Counting 10 times from 0; avoid unnecessary parenths; also putting
> the lesser value on the left of comparison makes it easier to
> understand.]
But without parentheses, are you really sure that it's not interpreted
as (i<(10&&0))<pid ?
Can you recite the 36 precedence levels of C?
And those of C++?
I'd write it as:
for(i=0;(i<10)&&(0<pid);i++){
to avoid slowing down thinking about irrelevant details such as
precedence levels...
--
__Pascal Bourguignon__ http://www.informatimago.com/
"Klingon function calls do not have "parameters" -- they have
"arguments" and they ALWAYS WIN THEM."
| |
| Francesco Frigo 2006-07-19, 7:01 pm |
| Hi,
> False.
> The memory pages are read-only in *both* the parent and the child.
you're completely right.
However this depends on the specific implementation too... so I tried to
explain a simpler approach, which would cause the same behaviour.
Thanks for pointing it out though.
PS - You need a base case on your signature. :-)
Sincerely,
Francesco Frigo
| |
| Paul Pluzhnikov 2006-07-19, 7:01 pm |
| Pascal Bourguignon <pjb@informatimago.com> writes:
[color=darkred]
> But without parentheses, are you really sure that it's not interpreted
> as (i<(10&&0))<pid ?
Yes I am (as, I am sure, is anyone who spends any significant time
looking at or writing C/C++ code).
> Can you recite the 36 precedence levels of C?
No. But I can tell that '&&' has lower precedence that almost
everything else, except bit-wize ops.
> I'd write it as:
>
> for(i=0;(i<10)&&(0<pid);i++){
>
> to avoid slowing down thinking about irrelevant details such as
> precedence levels...
The 'a < b && p != NULL' is such a common idiom (starting with K&R),
that your use of parenthesis (and lack of spaces), will slow *me*
down significantly:
I will have to stop and think -- are these just bogus (and the
writer a novice), or did I miss something important?
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| spibou@gmail.com 2006-07-20, 4:01 am |
|
Pascal Bourguignon wrote:
> Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> writes:
>
> But without parentheses, are you really sure that it's not interpreted
> as (i<(10&&0))<pid ?
I would be sure.
> Can you recite the 36 precedence levels of C?
C operators have 15 precedence levels.
> I'd write it as:
>
> for(i=0;(i<10)&&(0<pid);i++){
>
> to avoid slowing down thinking about irrelevant details such as
> precedence levels...
After some practice you don't have to think about it anymore
than you have to think about syntax in a (human) language
you know well ; it becomes second nature.
Spiros Bousbouras
| |
|
| > It sounds like you don't understand the difference between virtual
> addresses and physical addresses.
>
> Think of a virtual address as being like a room number in a building.
> Two different buildings can have a room number 101, but they aren't the
> same room because and the contents can be different. It's the same with
> virtual addresses and processes -- two different processes can have
> pointers with the same value, but they point to locations within that
> process's memory, so you get different values when you dereference them.
>
Thanks. You mean there is not a one-to-one relationship between a
vritual address and a physical address, but a one-to-many relationship.
Does the OS record and trace the memory space of each process
respectively?
Jack
| |
| Barry Margolin 2006-07-20, 9:59 pm |
| In article <1153416730.096983.304310@m73g2000cwd.googlegroups.com>,
"Jack" <junw2000@gmail.com> wrote:
> Thanks. You mean there is not a one-to-one relationship between a
> vritual address and a physical address, but a one-to-many relationship.
Yes.
> Does the OS record and trace the memory space of each process
> respectively?
I'm not sure what you're asking here. I suggest you do some background
research on virtual memory, then ask a more coherent question if
necessary. A good start would be:
http://en.wikipedia.org/wiki/Virtual_Memory
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Gordon Burditt 2006-07-20, 9:59 pm |
| >> > It sounds like you don't understand the difference between virtual
>
>Yes.
Can't the relationship be many-to-many? For example, mmap()ing the
same file twice in the same process (at different virtual addresses).
Gordon L. Burditt
| |
| Barry Margolin 2006-07-20, 9:59 pm |
| In article <12c0e9oa39ad430@corp.supernews.com>,
gordonb.boura@burditt.org (Gordon Burditt) wrote:
>
> Can't the relationship be many-to-many? For example, mmap()ing the
> same file twice in the same process (at different virtual addresses).
Good point. In context, the important point was the distinction between
one-to-one and not-one-to-one. I didn't feel the need to get into the
other exceptions (e.g. shared memory results in many-to-one) when
addressing such a basic question.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
|
|
|
|
|