For Programmers: Free Programming Magazines  


Home > Archive > PHP Documentation > November 2007 > Re: [PHP-DOC] On serializing the id index









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: [PHP-DOC] On serializing the id index
Richard Quadling

2007-11-30, 8:02 am

If, at the least, the files was a simple $a_Array, which could be
written using ...

file_put_contents('./index.cache', ,'<?php $a_Index = ' .
var_export($a_Index, True) . '; ?>');

sort of thing, then you could just include it and the index would be
available instantly. No need for file io parsing.



On 30/11/2007, Edward Z. Yang <edwardzyang@thewritingpot.com> wrote:
> I understand this is the least of our worries right now, but I'd like to
> get the indexer cache working so I can get karma for PhD. :-D
>
> The main troubles with serialize is that it 1. wastes space by not
> preserving internal references and 2. requires lots of memory. I recall
> SQLite being proposed as a possible solution, but since the entire index
> is loaded into memory I don't think that's necessary: just as easily
> parseable file format.
>
> So, we have a file, with each id seperated by a newline, and the fields
> seperated by tabs (both of which should never occur in the fields, if
> they do I'll use a control character or something). children is
> collapsed into a list of IDs.
>
> Reassembling is as simple parsing the file line by line, constructing
> the ID array from the fields. Then the children are reconstituted by
> running through $IDs a second time, replacing them with references to
> the appropriate indexes.
>
> Thoughts?
>
> --
> Edward Z. Yang GnuPG: 0x869C48DA
> HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter
> [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]]
>



--
-----
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"
Edward Z. Yang

2007-11-30, 7:02 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Richard Quadling wrote:
> If, at the least, the files was a simple $a_Array, which could be
> written using ...
>
> file_put_contents('./index.cache', ,'<?php $a_Index = ' .
> var_export($a_Index, True) . '; ?>');
>
> sort of thing, then you could just include it and the index would be
> available instantly. No need for file io parsing.


That's an interesting technique. However, PHP's source code parser is
quite a bit more complicated than what our index format needs, so it's
quite possible that the fact that our custom format's simplicity would
offset PHP's performance gains from being written in C (I suppose we'd
have to benchmark to find out, but I remember MediaWiki's developers
talking about this).

The other implementation difficulty is getting the references to point
to the right locations in var_export, which is probably impossible,
which means we have to make a copy of $IDs in order to process in order
a non-referenced structure (alternatively, reversibly de-reference and
reference the children's entries).

- --
Edward Z. Yang Portable GnuPG: 0x995A2C84
HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter
[[ C8D5 9E3C 15AD 1467 5561 2C0E 719A 2D9D 995A 2C84 ]]
This Message Courtesy of Thunderbird Portable
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFHUFCBcZotnZlaLIQRAh5NAJsGo++SnO/DH0rkPFqErDoaVMBv9wCbBQmZ
Qd0CpeQDLcHJyPuv25wZ/mk=
=mFPf
-----END PGP SIGNATURE-----
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com