Home > Archive > PostScript > February 2005 > Unexpected characters preceding %! PS file magic bytes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Unexpected characters preceding %! PS file magic bytes
|
|
| Mr. Uh Clem 2005-02-18, 3:59 pm |
| [My apologies if this question has been answered before in
this or other groups or elsewhere on the web. I'm having
trouble producing sensible Google searches that involve
odd character strings like "%!" and strings that frequently
appear in posts & pages.]
I'm working on some code similar to the Unix file command
which looks at the first few K of a file and tries to
recognize file type. Amongst the things it needs to
recognize is postscript, and if postscript, recognize DSC
so we can tell if there are inter-page dependencies or not.
The scan is being tripped up by unexpected characters
preceding the %! PS magic bytes. (PJL is recognized and
handled appropriately.) I can't seem to find any info
on any sort of data which could legitimately precede the
%! in the two books I have, "Postscript Language Reference
Manual Second Edition" and "Postscript by Example". They
seem to imply that PS files should start with some flavor
of %!. (Can't claim to have read them cover to cover
though...) A 1994 c.l.p post is more emphatic.
<http://groups.google.com/groups?sel...@dvorak.amd.com>
The red book (1rst ed) sez on page 281 "The very first line
of every PostScript program (whether it is conforming or
nonconforming) should be a comment that begin with the
characters '%!'."
Yet yesterday, I did a print to file of a text file in
win98 notepad through a Lexmark driver and got:
^[%-12345X@PJL JOB
@PJL LJOBINFO USERID = "xxxxxx" HOSTID = "yyyyy"
@PJL SET RESOLUTION = 600
@PJL RDYMSG DISPLAY = "zzzzzz.txt - Notepad"
@PJL ENTER LANGUAGE = Postscript
^AM%!PS-Adobe-3.0 <<<<<<<<<<<<<<<<<--------?What is ^AM ? ????
%%Creator: LEXPS Version 7.2
%%Title: zzzzzz.txt - Notepad
%%LanguageLevel: 2
%%BoundingBox: 12 13 599 780
%%DocumentNeededResources: (atend)
%%DocumentSuppliedResources: (atend)
%%Pages: (atend)
%%BeginResource: procset Win35Dict
....
%%Trailer
SVDoc restore
userdict /SVDoc known {SVDoc restore} if
end
%%Pages: 2
% TrueType font name key:
% MSTT31c52a = 0000DCourier NewF00000000000000000000
% MSTT31c544 = 0000DCourier NewF00000000000001900000
%%DocumentSuppliedResources: procset Win35Dict 3 1
%%DocumentNeededResources: font Courier
%%EOF
^D^[%-12345X@PJL RDYMSG DISPLAY=""
^[%-12345X@PJL EOJ
^[%-12345X
where ^[ is escape (hex 1b), ^A is STX (hex 01) and
^D is EOT (hex 04). I familiar with the EOT at
the end of PS files (even though I can't find any
thing about it in my books or this group) but what's
up with STX M preceding %!PS-Adobe ?
Q1: Is there a convention somewhere for this prefix?
Q2: What other sort of cruft might I have to skip past
when looking for %! ?? (legitimately or just due
to common brain-deadness.)
What got me started on this was a customer complaining
that his PS job was not being recognized as DSC. What
he was trying to send for print started with:
%!
%% %%
%!PS-Adobe-3.0
%RBINumCopies: 1
%%Pages: (atend)
%%BoundingBox: 0 0 612 792
%%Creator: texttops/CUPS v1.1.19
%%CreationDate: dow dd mmm yyyy hh:mm:ss PM PST
%%Title: (stdin)
%%For: xxxxx
%%DocumentNeededResources: font Courier-Bold
%%+ font Courier
%%DocumentSuppliedResources: procset texttops 1.1 0
%%+ font Courier-Bold
%%+ font Courier
%%EndComments
The first two lines screw up a perfectly good DSC
print job. PLRM 2nd Ed says on p634: "All instances
of %! after the first instance are ignored by document
managers, although to avoid confusion, this notation
should not appear twice within the block of header
comments ..."
The customer says that they got this with an off
the shelf Redhat/Fedora CUPS configuration targeting
an HP 9000 printer. (Relaying via LPD through our server
first.) The extra two lines smell of something wrong in
the app/CUPS formatting foodchain. I'm concerned this
might be a standard stupidity and we'll have to provide
a way to deal with it. Anyone else see this? (Perhaps
there's a better place to ask about this aspect of my
problems.)
Thanks & Cheers!
--
Clem
"If you push something hard enough, it will fall over."
- Fudd's first law of opposition
| |
| Norbert Hahn 2005-02-18, 8:57 pm |
| "Mr. Uh Clem" <uhclem@DutchElmSt.invalid> wrote:
>I'm working on some code similar to the Unix file command
>which looks at the first few K of a file and tries to
>recognize file type. Amongst the things it needs to
>recognize is postscript, and if postscript, recognize DSC
>so we can tell if there are inter-page dependencies or not.
>
>The scan is being tripped up by unexpected characters
>preceding the %! PS magic bytes. (PJL is recognized and
>handled appropriately.)
Depending on the printer driver several characters may
preceed %!
Some old Encapsulated PostScript file may start with their
magic character '\305\320\323\306\036' (these are octal numbers).
Some printers use a setup language in a similar way as PJL,
namely printers made by Kyocera. It's called Prescribe.
The magic character for Prescribe is !R! (but it can be
redefined using a different letter. Prescribe instructions
end with exit; If Prescribe is used to switch the printer
to emulate a HP printer then PJL may follow Prescribe.
[snip]
>^AM%!PS-Adobe-3.0 <<<<<<<<<<<<<<<<<--------?What is ^AM ? ????
I checked the PS driver for a Lexmark color laser printer on a Windows XP
system and voila, it starts with ^AM.
You may dig as well in
http://cvs.sourceforge.net/viewcvs.....pl?rev=1.1.2.2
HTH
Norbert
| |
| Mr. Uh Clem 2005-02-19, 3:57 pm |
| Norbert Hahn wrote:
> "Mr. Uh Clem" <uhclem@DutchElmSt.invalid> wrote:
>
>
>
> Depending on the printer driver several characters may
> preceed %!
Wonder if it's reasonable to scan for %!... in just the first line.
Or should I scan deeper?
> Some old Encapsulated PostScript file may start with their
> magic character '\305\320\323\306\036' (these are octal numbers).
"EPSF" all with the the MS bit on, followed by an RS (record
separator) character. I presume this IS the magic, not a
prefix to %!...
> Some printers use a setup language in a similar way as PJL,
> namely printers made by Kyocera. It's called Prescribe.
> The magic character for Prescribe is !R! (but it can be
> redefined using a different letter. Prescribe instructions
> end with exit; If Prescribe is used to switch the printer
> to emulate a HP printer then PJL may follow Prescribe.
> [snip]
Thanks for reminding me of Prescribe. I'd seen it a long
time ago and forgotten about it.
>
>
>
> I checked the PS driver for a Lexmark color laser printer on a Windows XP
> system and voila, it starts with ^AM.
Wonder if other makes/drivers do this?
>
> You may dig as well in
> http://cvs.sourceforge.net/viewcvs.....pl?rev=1.1.2.2
>
> HTH
> Norbert
>
Indeed it does, thanks!
--
Clem
"If you push something hard enough, it will fall over."
- Fudd's first law of opposition
| |
| Helge Blischke 2005-02-19, 3:57 pm |
| Mr. Uh Clem wrote:
>
> [My apologies if this question has been answered before in
> this or other groups or elsewhere on the web. I'm having
> trouble producing sensible Google searches that involve
> odd character strings like "%!" and strings that frequently
> appear in posts & pages.]
>
> I'm working on some code similar to the Unix file command
> which looks at the first few K of a file and tries to
> recognize file type. Amongst the things it needs to
> recognize is postscript, and if postscript, recognize DSC
> so we can tell if there are inter-page dependencies or not.
>
> The scan is being tripped up by unexpected characters
> preceding the %! PS magic bytes. (PJL is recognized and
> handled appropriately.) I can't seem to find any info
> on any sort of data which could legitimately precede the
> %! in the two books I have, "Postscript Language Reference
> Manual Second Edition" and "Postscript by Example". They
> seem to imply that PS files should start with some flavor
> of %!. (Can't claim to have read them cover to cover
> though...) A 1994 c.l.p post is more emphatic.
>
> <http://groups.google.com/groups?sel...@dvorak.amd.com>
>
> The red book (1rst ed) sez on page 281 "The very first line
> of every PostScript program (whether it is conforming or
> nonconforming) should be a comment that begin with the
> characters '%!'."
>
> Yet yesterday, I did a print to file of a text file in
> win98 notepad through a Lexmark driver and got:
>
> ^[%-12345X@PJL JOB
> @PJL LJOBINFO USERID = "xxxxxx" HOSTID = "yyyyy"
> @PJL SET RESOLUTION = 600
> @PJL RDYMSG DISPLAY = "zzzzzz.txt - Notepad"
> @PJL ENTER LANGUAGE = Postscript
> ^AM%!PS-Adobe-3.0 <<<<<<<<<<<<<<<<<--------?What is ^AM ? ????
> %%Creator: LEXPS Version 7.2
> %%Title: zzzzzz.txt - Notepad
> %%LanguageLevel: 2
> %%BoundingBox: 12 13 599 780
> %%DocumentNeededResources: (atend)
> %%DocumentSuppliedResources: (atend)
> %%Pages: (atend)
> %%BeginResource: procset Win35Dict
> ...
> %%Trailer
> SVDoc restore
> userdict /SVDoc known {SVDoc restore} if
> end
> %%Pages: 2
> % TrueType font name key:
> % MSTT31c52a = 0000DCourier NewF00000000000000000000
> % MSTT31c544 = 0000DCourier NewF00000000000001900000
> %%DocumentSuppliedResources: procset Win35Dict 3 1
>
> %%DocumentNeededResources: font Courier
>
> %%EOF
> ^D^[%-12345X@PJL RDYMSG DISPLAY=""
> ^[%-12345X@PJL EOJ
> ^[%-12345X
>
> where ^[ is escape (hex 1b), ^A is STX (hex 01) and
> ^D is EOT (hex 04). I familiar with the EOT at
> the end of PS files (even though I can't find any
> thing about it in my books or this group) but what's
> up with STX M preceding %!PS-Adobe ?
>
> Q1: Is there a convention somewhere for this prefix?
>
> Q2: What other sort of cruft might I have to skip past
> when looking for %! ?? (legitimately or just due
> to common brain-deadness.)
>
> What got me started on this was a customer complaining
> that his PS job was not being recognized as DSC. What
> he was trying to send for print started with:
>
> %!
> %% %%
> %!PS-Adobe-3.0
> %RBINumCopies: 1
> %%Pages: (atend)
> %%BoundingBox: 0 0 612 792
> %%Creator: texttops/CUPS v1.1.19
> %%CreationDate: dow dd mmm yyyy hh:mm:ss PM PST
> %%Title: (stdin)
> %%For: xxxxx
> %%DocumentNeededResources: font Courier-Bold
> %%+ font Courier
> %%DocumentSuppliedResources: procset texttops 1.1 0
> %%+ font Courier-Bold
> %%+ font Courier
> %%EndComments
>
> The first two lines screw up a perfectly good DSC
> print job. PLRM 2nd Ed says on p634: "All instances
> of %! after the first instance are ignored by document
> managers, although to avoid confusion, this notation
> should not appear twice within the block of header
> comments ..."
>
> The customer says that they got this with an off
> the shelf Redhat/Fedora CUPS configuration targeting
> an HP 9000 printer. (Relaying via LPD through our server
> first.) The extra two lines smell of something wrong in
> the app/CUPS formatting foodchain. I'm concerned this
> might be a standard stupidity and we'll have to provide
> a way to deal with it. Anyone else see this? (Perhaps
> there's a better place to ask about this aspect of my
> problems.)
>
> Thanks & Cheers!
>
> --
> Clem
> "If you push something hard enough, it will fall over."
> - Fudd's first law of opposition
The ^AM switches to the TBCP (Tagged Binary Communications Protocol) as
defined by Adobe. It is not part of the PostScript language in any
concern
but was originally desinged to permit binary octets to be passed over
serial lines which need certain control characters for flow control
etc., i.e.
as part of the physical layer protocol. By the way, this protocol is
switched
off by the "universal language switching control thingy" ^[%-12345X .
In a well designed printing workflow, this should be issued by the very
last
component that feeds data to the real printer, e.g. the CUPS backend,
but
surely it is legitime to generate it anywhere in the workflow chain.
Helge
--
Helge Blischke
Softwareentwicklung
SRZ Berlin | Firmengruppe besscom
http://www.srz.de
tel: +49 30 75301-360
| |
| Mr. Uh Clem 2005-02-22, 8:59 pm |
| Helge Blischke wrote:
> Mr. Uh Clem wrote:
....
>
>
> The ^AM switches to the TBCP (Tagged Binary Communications Protocol) as
> defined by Adobe. It is not part of the PostScript language in any
> concern
> but was originally desinged to permit binary octets to be passed over
> serial lines which need certain control characters for flow control
> etc., i.e.
> as part of the physical layer protocol. By the way, this protocol is
> switched
> off by the "universal language switching control thingy" ^[%-12345X .
Thanks for pointing me at TBCP. I trust you saying that a _preceding_
UEL disables TBCP...
> In a well designed printing workflow, this should be issued by the very
> last
> component that feeds data to the real printer, e.g. the CUPS backend,
> but
> surely it is legitime to generate it anywhere in the workflow chain.
>
> Helge
>
Doesn't seem reaonable at all for apps and non-final spoolers to
be adding stuff like this. Windows is terrible about this with
PJL and I've seen CUPS jobs from this customer format PS that elicits
replies from the printer, even though it is not the final spooler.
The customer ran some more tests and gets things like
%!
%% %%
<</ManualFeed false>>setpagedevice
<</Duplex true /Tumble false>>setpagedevice
%!PS-Adobe-3.0
%%Requirements: duplex
in the file. (After the PJL header)
"This is cups from FC1 configured to use the PostScript
driver with Duplexing enabled."
I'm all the more convinced that the CUPS PS driver is
adding the prefix (4 lines in this case.) Would it
be valid/reasonable to ignore the Redbook and accept
this as DSC despite the %! which the Redbook says should
take precidence? Or are there official mitigating
rules? How far into the file should I go looking
for proper DSC?
As far as trying to suppress "spoolerisms", I guess
I need to find an appropriate CUPS-oriented newsgroup
to see if there is a way to tell CUPS to format for
printer type X, but send it via some other *system*.
Thanks again!
--
Clem
"If you push something hard enough, it will fall over."
- Fudd's first law of opposition
| |
| Helge Blischke 2005-02-23, 4:00 pm |
| Mr. Uh Clem wrote:
>
> Helge Blischke wrote:
>
> ...
>
>
> Thanks for pointing me at TBCP. I trust you saying that a _preceding_
> UEL disables TBCP...
>
>
> Doesn't seem reaonable at all for apps and non-final spoolers to
> be adding stuff like this. Windows is terrible about this with
> PJL and I've seen CUPS jobs from this customer format PS that elicits
> replies from the printer, even though it is not the final spooler.
Windows is designed to behave as the very latest instance between the
app
and the real printer (see that weird default "optimize for speed"
option).
The modified PostScript generators by ESP (as far as available) produce
(fairly) DSC compliant output.
>
> The customer ran some more tests and gets things like
>
> %!
> %% %%
> <</ManualFeed false>>setpagedevice
> <</Duplex true /Tumble false>>setpagedevice
> %!PS-Adobe-3.0
> %%Requirements: duplex
>
> in the file. (After the PJL header)
>
> "This is cups from FC1 configured to use the PostScript
> driver with Duplexing enabled."
>
> I'm all the more convinced that the CUPS PS driver is
> adding the prefix (4 lines in this case.) Would it
> be valid/reasonable to ignore the Redbook and accept
> this as DSC despite the %! which the Redbook says should
> take precidence? Or are there official mitigating
> rules? How far into the file should I go looking
> for proper DSC?
The CUPS pstops filter prepends the PJL stuff as defined in the PPD
to the PostScript job and as well adds the TBSC control code at the
end of the PJL stuff.
>
> As far as trying to suppress "spoolerisms", I guess
> I need to find an appropriate CUPS-oriented newsgroup
> to see if there is a way to tell CUPS to format for
> printer type X, but send it via some other *system*.
>
> Thanks again!
>
Helge
--
Helge Blischke
Softwareentwicklung
SRZ Berlin | Firmengruppe besscom
http://www.srz.de
tel: +49 30 75301-360
|
|
|
|
|