For Programmers: Free Programming Magazines  


Home > Archive > PERL CGI Beginners > July 2007 > text file upload ... work around wide characters?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author text file upload ... work around wide characters?
Michael Higgins

2007-05-23, 6:56 pm

Hello, List-ers --

I've come across a problem, unsure where to ask, so subscribed here. I
upload a file through a browser. It's a '.txt' file and it comes as
text/html.

However, I've found some hyphen and single-quote like characters that are in
this text file are from a higher codepoint... or something. What _seems_ to
happen is the browser is stripping them and my script isn't getting all the
info to dump into my database.

Looking at the headers going from my browser, it shows the whole of the
file, less the ?wide? chars. On the other end, Apache/CGI sees the end of
the file early, or something.

Could it be that the browser gets a null byte, or something in this 'text'
file, and truncates the file?

And... has anyone here figured this one out? I can't be the first to have
this kind of problem. ;-)

Cheers,


Michael Higgins


Michael Higgins

2007-05-23, 6:56 pm

>> -----Original Message-----

[8<]
[color=darkred]
> -----Original Message-----
> From: Scott Statland [mailto:statland@msn.com]
>
> The characters that you are describing, may need to be
> escaped or have their codes entered.
> It sounds like that they may have special meanings in either
> the scripting language or in the html output.


Hmm.

I guess my question wasn't clear. The issue is a file upload that is tagged
as text/html but has wide characters in it. The file doesn't make it out of
the browser right AFAICT. (If this is obviously incorrect, please post the
correction!)

A little more pain and research let me to find this:

open F, '<', $ARGV[0] or die $!;
for (<F> ){
s/([^\x00-\x7f])/sprintf('&#%d;', ord($1))/ge;
print
}

.... helpful code snippet, which applied to my files before they are uploaded
gives me a new text file with lines like: "Regarding the box – the
driver wouldn’t".

The part is that it is uploaded fully and when viewed in a browser the
characters are displayed correctly. Duh.

Now, if I could only get the browser to fix it up like this when sending...
rather than what it was doing. Since it's going to a *nix box, I don't care
about the text/binary thing, right? I guess I could test from a 'nix Firefox
and see if the behaviour is different.

Anyone have a thought on what is happening that the browser upload fails to
accommodate text with wide chars? I don't know how it determines ... maybe
if the first char was wide, it'd go up as a different mimetype?

Cheers,


Michael Higgins


Mumia W.

2007-05-23, 6:56 pm

On 05/23/2007 04:16 PM, Michael Higgins wrote:
>
> [8<]
>
>
> Hmm.
>
> I guess my question wasn't clear. The issue is a file upload that is tagged
> as text/html but has wide characters in it. The file doesn't make it out of
> the browser right AFAICT. (If this is obviously incorrect, please post the
> correction!)
>
> A little more pain and research let me to find this:
>
> open F, '<', $ARGV[0] or die $!;
> for (<F> ){
> s/([^\x00-\x7f])/sprintf('&#%d;', ord($1))/ge;
> print
> }
>
> ... helpful code snippet, which applied to my files before they are uploaded
> gives me a new text file with lines like: "Regarding the box – the
> driver wouldn’t".
>
> The part is that it is uploaded fully and when viewed in a browser the
> characters are displayed correctly. Duh.
>
> Now, if I could only get the browser to fix it up like this when sending...
> rather than what it was doing. Since it's going to a *nix box, I don't care
> about the text/binary thing, right? I guess I could test from a 'nix Firefox
> and see if the behaviour is different.
>
> Anyone have a thought on what is happening that the browser upload fails to
> accommodate text with wide chars? I don't know how it determines ... maybe
> if the first char was wide, it'd go up as a different mimetype?
>
> Cheers,
>
>
> Michael Higgins
>
>
>


What browser is creating the problem? What O/S is that browser running on?

MSIE reportedly performs some file type heuristics, so I suspect the
browser is MSIE.

Evidently the .txt file looks like an HTML file. If it truly is an HTML
file, you might be able to fix the problem by specifying the character
set in a META tag.

Or you could compress the file with gzip or zip to convince MSIE to
leave the file alone when uploading it.


Pula-n

2007-06-26, 10:33 am

Free download from one of the best poorn galleries!
http://girls-with-toys.info/gal218571
Sweet_bambina

2007-06-30, 2:55 pm

Helen Hunt stripping down in the garden!

http://www.yourtubeaudio.com/Play?clip=726648
Gangsta

2007-07-01, 8:33 pm

Free access to one of the best poorn galleries!
http://girls-with-toys.info/gal218571
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com