A
form
in an HTML document (Web page) can contain an input
element
with type="file"
. This may let the user include one or
more files into the form submission.
The form is often processed so that such files are stored onto
the disk of the Web server; this is why file input (or file submission)
is often called “file upload.”
File input opens interesting possibilities, but browser support is
still limited and generally of poor quality even in newest versions.
Moreover, users are often puzzled with it, since most people
use file input rather
rarely.
This is a legacy document, with many references to outdated browser versions. It does not cover features such as the File API or the new features in file upload in HTML5.
This document presents
enctype
attribute;
Submitting several files?;
Setting the default filename;
Getting the original name;
The size
attribute;
Setting restrictions on the file size;
Filtering (through a file type filter);
The status of RFC 1867.
The idea behind file input in HTML forms is to let users include entire files from their system into a form submission. The files could be text files, image files, or other data. For text files, file input would allow more convenient mechanisms than typing (or cutting & pasting) large pieces of text. For binary data, such as images, file input would be not just more convenient but usually the only practical way. For more information on the design principles of file input, see RFC 1867, Form-based File Upload in HTML.
Writing an HTML form with a file input field is rather simple. The difficult thing is actually to find or write a server-side script which can do something useful when it receives data in such a format. And the really difficult thing is to make such processing robust and controlled so that all data is processed properly and so that someone won’t e.g. fill your server’s disk space with gigabytes of junk, by ignorance or by malevolence.
You need to know the general basics of writing HTML forms; if you need links to tutorials and references on forms, consult How to write HTML forms. Then, what you need to do in HTML is to write a form so that
action
attribute refers to a server-side
script which is capable of handling submissions containing forms or,
technically speaking, being in multipart/form-data
format; as explained below,
don’t even dream about using
mailto:
URLs in action
attributes,
in this context or otherwise!
method="post"
enctype="multipart/form-data"
<input type="file" name="somename" size="chars">
size
attribute
is optional, but setting
it to some relatively
large value (say 40
) probably helps the user,
since the default width of the
box in current browsers is rather narrow for typical filenames.
(See notes on the size
attribute.)
Minimally, the form needs to contain a a submit element too. It may also contain any other fields you like, and explanatory texts, images, etc.
A common problem with file input
in forms is that form data gets sent but only the name
of the file is included. The reason is typically that the form
element does not contain the attributes mentioned above.
Since browser support to file input is still problematic, consider providing alternative methods of submitting data, too.
It is hopefully evident that what happens in file input is the submission of a copy of the file content. The file on the user’s disk remains intact, and the server-side script cannot change it, only the copy of the data.
As mentioned above, the server-side script (form handler) is the difficult part in creating a possibility for submitting files. There are useful brief notes on that in the FAQ entry, but it is a difficult programming issue, and outside the scope of this document of mine. I just wish to emphasize—in addition to security issues discussed below - that what happens to the data after submission is at the hands of the server-side script. It could “upload” it, i.e. save onto the server’s disk under some name, but it might just as well process the data only by extracting some information from it, or send the data by E-mail somewhere, or even send it to a printer. For example, the WDG HTML Validator provides, as one alternative, a page containing a form for submitting a file to validation.
There are different server-side techniques for
processing forms, so you need to consult documentation applicable to
the technique you use, which is usually dictated by the characteristics
of the server software. In particular, if you use
CGI, it can be useful
to check section
Programs and Scripts: Perl: File Uploading in
CGI Resource Index.
(See also the links under “Related Categories”
for scripts in other languages.)
You might find a script suitable for your purposes, or at least ideas for
writing your own script.
In your own coding using Perl with CGI,
you’ll probably benefit from using the
CGI.pm
module; see especially
section
Creating a file upload field
in its documentation, and my
Fool’s Guide to CGI.pm.
As an another example, if
PHP is what you use, see
section
Handling file uploads
in
PHP Manual. For
ASP, see e.g. Pure ASP File Upload
by Jacob Gilley.
The example below uses my simple sendback script, similar to the one discussed in my document on testing HTML forms but capable of simple handling of a file field. It simply echoes back the data it gets, but presented so that your browser will display it nicely; for a file field, only 40 first octets (byes) are shown.
The HTML markup is:
<form action="http://jkorpela.fi/cgi-bin/echo.cgi"
enctype="multipart/form-data" method="post">
<p>
Type some text (if you like):<br>
<input type="text" name="textline" size="30">
</p>
<p>
Please specify a file, or a set of files:<br>
<input type="file" name="datafile" size="40">
</p>
<div>
<input type="submit" value="Send">
</div>
</form>
And on your browser, with its current settings, and as possibly affected by my stylesheet, this is what the form looks like
RFC 1867 describes, in section 3 Suggested implementation, how file input was intended to take place in a typical situation:
3.1 Display of
FILE
widgetWhen a[n]
INPUT
tag of typeFILE
is encountered, the browser might show a display of (previously selected) file names, and a “Browse” button or selection method. Selecting the “Browse” button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited.If an
ACCEPT
attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform.
Upon form submit, the contents of the files would then
be included into the data set sent, as defined by the specification
of the
multipart/form-data
data type (data format, data encoding).
Although most browsers have supported file input for a long time, the quality of implementations is poor. Therefore users easily get confused with file input.
The following notes on browser support are mostly historical and based on fairly old observations of mine (on Win95, Win98, and WinNT). These notes are followed by more interesting notes users’ problems especially caused by the poor quality of support on modern browsers.
IE 3.0
displays an input box and
lets the user type a filename there—and it sends the
name as part of the form data!
Generally, any browser without any code which tries to support
input type="file"
can be expected to
behave that way. (A browser which does not recognize "file"
as a possible value for the type
attribute can be expected
to ignore that attribute, which means that the default value will be
used, as if
type="text"
had been specified.)
IE 4 has an input box and a “Browse” capability, and it actually sends the file content, but it still allows one file only to be selected. The “Browse” function display is unfiltered, i.e. all files which are normally visible are selectable. There does not seem to be any improvement in this respect in IE 5, or IE 6, or IE 7.
According to Netscape’s documentation on file input, support to it exists already in Netscape 2.
Netscape 4 support to file input has a “Browse” capability, too, but the browsing has by default a filter which limits selectability to “HTML files”. The user can manually change this, though it is questionable how familiar users are with such things. Only one file can be specified. There does not seem to be any improvement in this respect in Netscape 4.5. Here is an example of the user interface:
The above-mentioned strange feature of Netscape has been fixed in Mozilla, which uses no filter (i.e. displays all files); on the other hand it (at least in several versions) gives no user option to switch to a filtered view!
Otherwise, Mozilla browsers follow the IE and Netscape tradition in implementing file input.
Opera supports file input rather well. It provides a “Browse” menu, though the button for activating it carries the label “...”, which might be somewhat confusing. It lets the user specify several files from the menu:
It isn’t perfect though. The Browse window is rather small, and it is
impossible to pick up several ranges, i.e. you must click on the files individually
unless you want to select just one contiguous range.
And the box for file names is quite small too, and its size is not
affected by the size
attribute.
See also notes on setting the default filename.
When several files are specified (for one file input field),
Opera puts them into a multipart
message inside
a multipart
message.
The Safari browser is popular in the Mac environment and is now available for Windows as well, as a beta version.
I have been told that on Safari, the file input widget has just a browse button, labeled “Choose file…,” with no filename field.
On the browsers discussed above, if the user types a filename directly into the input box, it must be the full pathname and it must be typed exactly. If the input is not a name of an existing file (e.g. due to a typo), then the form will be sent as if an empty file had been specified (though with the name given by the user), and no warning is given. People who encounter file input for the first time might be expected to get very confused, since the filename box appears first and looks like an area where the user should type something.
The user probably often wishes to view the
contents of files in the dialog, since it is difficult to
select the file on the basis of its name only.
On Windows systems, the browsers discussed here seem to use
widgets where normal clicking on a file icon selects it, and to
open it (in some program) one needs to use right click and
select a suitable action. I guess most users won’t find that out
without being helped. The following screen capture presents
the dialogue on IE 4 (on WinNT) in a situation where the user
has right clicked on an icon and an action menu has popped up and
the user is about to select the Open action (which would,
in this case, probably open the .jpg
file in a graphics
program or in a new browser window.
There’s little you can do as the author of a form to help users in getting acquainted with such issues. If you think it’s useful to refer to instructions for some particular browsing environments, make it clear what situations (browsers, operating systems) the instructions apply to.
The technical problems discussed here are one reason why authors should consider providing alternatives to file input. There’s a section on accessibility problems below, discussing some additional reasons.
All the browsers mentioned above use essentially
similar appearance for the
widget used to implement a file input element:
a text input box for the filename looks similar to
normal text input elements (<input type="text">
), and
the Browse button
resembles submit buttons
(thus, is often grey), and it has
the text “Browse” or
its equivalent in another language.
That text is under the control of the browser, not the author. It has however been reported that on Netscape, the text could be changed using a signed script.
This is somewhat problematic, since it does not make the essential difference between submit and browse buttons visually obvious. Cf. to similar problems with reset buttons.
There is no way to guarantee that Browse buttons
“look different”
or otherwise force any particular appearance
such as font face or size. See
the document
Affecting the presentation
of form fields on Web pages
for an overview and examples. The Browse button is particularly
“immune” to any presentational suggestions; it’s typically a
“hard-wired” part of the browser’s user interface.
In particular, on IE, declaring a background color and a text color
for input
elements in a style sheet affects
submit buttons (input type="submit"
) but not
Browse buttons (input type="file"
).
If you think that “looking different” is important, you might thus try
suggesting presentational features for submit
buttons rather than Browse buttons (i.e., for input type="file"
elements). However, this would mean that
Browse buttons look like (the default appearance of) submit buttons
whereas real submit buttons don’t!
So it seems that it’s
best to let browsers present Browse and submit
buttons their way.
The input box for the filename, on the other hand,
seems to be affected by similar factors as normal text input boxes.
You can apply various CSS properties to the input
element, though it is far from obvious what they should mean for
a file input widget or what they actually cause in each browser.
Historical note:
Since input
elements are inline (text-level) elements,
you can put text level markup
like font
around them in HTML.
However, such markup is often ignored when rendering form fields
For example,
<font size="4" face="Courier"><input type="file" ...></font>
might
increase the font size and set the font to Courier. Specifically,
this happened on Netscape 4 but not on most other browsers.
(As a side effect, on Netscape 4, such
a font size change affected the dimensions of the Browse
button but not the font size of the the text “Browse”.
Note that
if you included a color
attribute there,
Netscape ignored it.)
You could suggest presentational properties
in a style sheet too, e.g.
<input type="file" ... style="color:#f00; background:#ccc">
and these in turn would be ignored e.g. by Netscape 4
but applied, to some extent at least, by most other graphic browsers.
It is difficult to say how CSS rules should
affect the widget, since it is an open question whether e.g.
the text of the Browse button (which is not part of the
textual content of the HTML document)
should be formatted according to the font properties of the
input
element. (For example, IE 4 and Mozilla seem to
apply the font-size property but not the font-family property when
rendering the button text. IE 6 applies font-family too.
The following example demonstrates how your browser treats a file input element where we suggest presentational properties both in HTML and in CSS:
The example has the HTML markup
<b><tt><big><input type="file" ...></big></tt></b>
and the following CSS declarations applied to that input
element:
color:#630; background:#ffc none; font-size:160%; font-family:Courier,monospace; font-weight:bold
Such suggestions might help in making it clearer to users that
there is a special input box. But try to avoid making it look
too special, since there is then the risk of not getting
intuitively recognized as an input box at all.
At Quirksmode.org, there is a longish article that discusses fairly complex CSS techniques for changing the appearance of file input elements, in a sense: Styling an input type="file". I would however advice against any substantial changes in the appearance. Any esthetic improvement over browser defaults (in addition to being a matter of taste) has a price: it makes even the experienced user uncertain of what the widget is.
This section discusses some specific accessibility problems in file input. For an overview of what accessibility is and why it is important, please refer to the Guide to Web Accessibility and Design for All.
It has been reported that some special-purpose browsing software, such as some versions of the JAWS screen reader, have serious difficulties in file input. This is understandable, since the common implementation in browsers is oriented towards visual interaction.
Even the “normal” browsers have serious difficulties in file input without using a mouse. (There are different reasons, including physiological and neurological problems, why the user may need to work without a mouse or other pointing devide.) In Internet Explorer 6, you can select the Browse button by tabbing, but if you try to use the keyboard to activate it, hitting the Enter key, the browser submits the form instead! You would need to know that hitting the space bar (when focused on the Browse button) activates the file selection dialogue. Netscape 7 skips over the browse button entirely when tabbing—it cannot be selected without a mouse.
Not surprisingly, on Opera things work reasonably. The user can select the Browse button using the tab key and activate it by pressing the enter key, then select a file for upload from the file system; you would use the arrow keys move around in the file selection.
On the Lynx text browser, at least on Lynx 2.8.4 on Unix, there is no Browse button, and there is no dialogue for accessing the computer’s file system. Thus, the user needs to know the exact path name and syntax to type in the file name for upload, as is apparently also the case for IE and Netscape.
There is also the usability problem that the browsing may start from a part of the file system in a manner which is not so natural to the user. The initial selection might be e.g. that of the directory where the Web browser itself resides! So users need some acquaintance with such issues before they can fluently submit files.
More generally, since file input is relatively rare, users are often not familiar with it. They might not recognize the Browse button, and might have difficulties in understanding what’s going on when they click on it (or fail to click on it).
Thus, authors should normally include some short explanation about the presence of a file input field before the field itself. This can usually me done in a natural way, explaining simultaneously what kind and type of file should be submitted.
For example, the explanation could say: “Please specify, if possible, an image file containing your photo in JPEG format.” Such a note may not help much when a user encounters such a field for the first time in his life, but it helps him to associate the eventual problems with a concept of file input and to explain his problems when seeking for help. And if he has tried to use file input before, it tells him to stay tuned to something special, and perhaps at this point, before entering the file input field, to access the file system outside the browser and find the exact path name of the file he wants to submit.
The file is submitted as such, without code conversions. A plain text file is submitted without information about character encoding, so the recipient needs to guess the encoding or infer it somehow.
For example, suppose that you have a UTF-8 encoded form and that it is used to submit a plain text file. If the user wrote the file using Notepad, it will (by default) be in windows-1252 encoding, and its content is sent as such, declared just as text/plain (no charset attribute), even though contents of normal fields are UTF-8 encoded. The server-side form handler has no direct way of knowing what the encoding is, so how can it meaningfully process the data?
In general, the browser cannot tell the encoding, so it can neither declare it nor code-convert the data. The reason is that commonly used file systems lack indication of the character encoding of a plain text file; it just needs to be known.
Thus, if your form is meant for submitting plain text files, your best option is probably to ask users to save their text files in UTF-8 encoding with BOM (Byte Order Mark). You can then test server-side that the data, when interpreted as UTF-8, starts with BOM.
There are several possible ways to let people submit their files even when their browsers do not support file fields in forms (or the support is of so poor quality that they don’t want to use it).
You could include a
TEXTAREA
element into the form.
This would work especially for text files in the sense that a user
could open his file in an editor and cut & paste the data
into the textarea. Naturally, this becomes awkward for large files,
but it might still be a good idea to have a textarea along with
a file input field. Your server side script would need some more
code to handle both.
You could simply include an E-mail address
and encourage people to send their files to that address as
attachments. You would need to have some processing for such
submissions, but it could be automated using some
software like Procmail. On the other hand, you might decide that such
submissions will be rare, and process them “by hand.”
Make sure the address is visible on the page
itself. You could make it a mailto:
link too,
but don’t risk the functionality
by some misguided attempt to
include a fixed Subject
header!
Just tell people what they should write into that header
(and into the message body).
Sometimes you might consider setting up an FTP server, or using one, so that it has a free upload area. You would then just specify the server and the area, and people could use their favorite FTP clients. Note that for the submission of a large number of files, FTP would be more comfortable than using a form with a file input field.
Especially for local users, you could just give a physical address to which people can bring or send their files e.g. on diskettes or CD roms. Make it clear to them in advance which media and formats you can handle that way.
input type="file"
in
the description of the input
element in
HTML
4.0 Reference by
WDG.
That document is also available e.g. as
a mirror copy in Denmark. Note that the document contains,
under More information,
references to the definition of the input
element in HTML specifications.
See also notes on RFC 1867.
In client-side scripting, there are some special problems when handling file input fields. The JavaScript Form FAQ contains answers to such questions:
See also notes on filtering above as regards to support to event attributes for file input.
The HTML 4.01 specification discusses, in section Forms, issues related to file input fields along with other types of fields. The notes below hopefully help in locating and interpreting the relevant portions.
enctype
attribute
The HTML 4.01 specification
defines an enctype
attribute for the
form
element.
Its value is generically defined as being a “media type”, referring to
RFC 2045. (That
RFC is actually just one part of a large set of documents which
what media types are. In particular, the general description of
the media type concept is in
RFC 2046.)
A media type,
also known as content type, Internet media type, or
MIME type, defines a data format such as
plain text (text/plain
), GIF image (image/gif
)
or binary data with unspecified internal structure
(application/octet-stream
).
But in the context of form submission, the use of a media
type as the value of the enctype
attribute is meaningful
only if there is a definition of the conversion to be
done. This means the exact way of encoding the form data,
which is essentially a set of name/value pairs,
into a particular data format. The definition must be rigorous, since
otherwise it is impossible to process the data in a useful, robust way
by computer programs.
The HTML specification
defines two possible values for enctype
:
enctype="application/x-www-form-urlencoded"
(the default)
=
value strings separated by
ampersands (&
) and uses some special
“escape” mechanisms
for characters, such as %28
for the “(” character.
It’s confusing if people try to read it—it was meant to be
processed by programs, not directly read by humans!
enctype="multipart/form-data"
multipart
message containing those
presentations as its components. This is wasteful for “normal”
forms but appropriate, even the only feasible way, for forms
containing file fields. The multipart
structure
means that each file comes in a nice “package” inside a larger
package, with a suitable “label” (content type information)
on the inner “package.”
This type was originally defined in
RFC 1867 but it is also discussed in
RFC 2388
(see notes on the RFCs later).
Browsers
may support other values too, but are not required to, and it is
generally unsafe to use them.
Sometimes people use enctype="text/plain"
,
and text/plain
is per se a well-defined media type;
but there is no specification of the exact method of encoding
a form data set into such a format, and browsers are not required to
support such an attribute—so anything may happen if
you use it.
Normally you should not try to re-invent the wheel by writing code which interprets (decodes) the encoded form data. Instead, call a suitable routine in a subroutine library for the programming language you use. It typically decodes the data into a convenient format for you to process in your own code.
It seems that the HTML 4.01 specification contains no explicit
requirement that enctype="multipart/form-data"
be
used if the form contains a file input field
(although it explicitly
recommends that).
But e.g. IE 4 and Netscape 4 handle form submissions incorrectly
if the enctype
is defaulted in such a case:
they send the name of the file instead of its content!
The HTML 4.01 specification uses the term
file select for the “control” (i.e. form field)
created by an input type="file"
element. It
specifies file select so
that this control type allows the user to select files
so that their contents may be submitted with a form. Note the
plural “files”—the idea is clearly that one such field
should allow the inclusion of several files.
Note that there is nothing an author needs to do, and nothing he can do, to make a browser allow the selection of several files per input field. It depends on the browser whether that is possible.
However, as described above,
the current browser support is
poor: only some versions of Opera support multi-selection,
and these do not include the newest versions.
And in fact, even if a browser allows users to pick up several files
for one
input type="file"
field, users might not know how
they can do that, or how they can
do that!
Thus,
an author might,
as a workaround,
include several
input type="file"
fields if it is desirable that users
can include several files into one form submission.
Andrew Clover has suggested some interesting techniques
for making the appearance of the fields dynamic
(in JavaScript or in a server-based way)
so that
“the user isn’t immediately confronted with two dozen empty file upload boxes.”
Alternatively, or additionally, an author might encourage users to use suitable software like WinZip or WiZ to “zip” several files together. Naturally the server-side script must then be somehow prepared to handle zipped files.
The HTML 4.01 specification
describes
the value
attribute
for a file input field by saying that browsers (user agents)
“may use the value of the value
attribute as the initial file name.” This however is
usually not supported by browsers. The usual
explanation is “security reasons.”
And indeed it would be a security risk if files from the
user’s disk were submitted without the user’s content.
It might be all too
easy to lure some users into submitting some password files! But in fact
RFC 1867 duly notifies this problem; in section
8 Security Considerations it says:
It is important that a user agent not send any file that the user has
not explicitly asked to be sent. Thus, HTML interpreting agents are
expected to confirm any default file names that might be suggested
with <INPUT TYPE=file VALUE="yyyy">
.
It also mentions (in section 3.4) that the use of value
“is probably platform dependent” but then goes on:
“It might
be useful, however, in sequences of more than one transaction, e.g.,
to avoid having the user prompted for the same file name over and
over again.” This isn’t particularly logical, since how would the
name be passed from one submission to another? (The mechanism for
getting the original file name would be quite unreliable for such
purposes.)
A more useful application could be this: Assume that your form is for
reporting a problem with a particular program, say Emacs, and
that program uses a configuration file with some specific name, say
.emacs
, so that you would very much like to get the user’s
config file for problem analysis. Setting the default name, if supported
by the browser, might be an extra convenience to the user.
Thus, they just failed to implement it, for no good reason. This isn’t a very important flaw, however. The situations where it would make sense to suggest a default file name are rare.
Netscape’s old
HTML Tag Reference says, in
the description of input type="file"
,
that “VALUE=
filename
specifies the initial value of the input element,” but
no actual support to this in Netscape browsers has been reported.
Similar considerations apply to the
corresponding item
in Microsoft’s
HTML Elements reference.
It additionally messes things up by describing the intended
meaning wrong: “Sets or retrieves the value of the
<INPUT type=file>
.” The description links to
a description of the value
attribute which says:
“The value, a file name, typed by the user into the control.
Unlike other controls, this value is read-only.” This probably
relates to using the value
property in
client-side scripting.
And in fact, one can read the value in JavaScript
(and get the filename entered by the user)
but setting it is unsuccessful (without an error message); the same applies
to Netscape (but on Opera, even an attempt to read the value seems
to confuse the browser).
Note that the examples in the above-mentioned documentation
do not contain an input type="file"
element with
a value
attribute.
However,
support to file input in several versions of Opera
handles the value
attribute in the following way:
value
attribute.
Such support, however, is absent in Opera 7.54, for some reason.
The following form contains a file input field with
value="C:\.emacs"
. Your browser probably just ignores
that attribute, but some browsers may use it to set the initial
file name:
An example of Opera’s security alert in the situation discussed above:
There was a short-time bug in Opera 6 that created a security hole, which would have let authors grab users’ files without their knowing, i.e. bypassing the dialogue described above.
RFC 1867 says:
The original local file name may be supplied as well, either as a ‘filename’ parameter either of the ‘content-disposition: form-data’ header or in the case of multiple files in a ‘content-disposition: file’ header of the subpart. The client application should make best effort to supply the file name; if the file name of the client’s operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description.
But note that this appears in
subsection 3.3 of section 3. Suggested
Implementatation.
Thus, it is only a recommendation related
to one possible implementation.
You shouldn’t count on having a
filename
included.
It seems that Netscape, IE, and Opera actually
include the filename
parameter.
However, only Opera uses the format which
seems to be the intended one,
as deduced from the examples in
RFC 1867
(section 6),
namely a relative name like
foo.txt
, not a full pathname like
C:\mydocs\foo.txt
.
Internet Explorer 7 beta preview behaves similarly,
and this has been explained as a security improvement.
Is the Netscape and IE behavior really incorrect? Well, since most computers have some sort of path name system for file names, one would expect to see path names in examples if the intent had been that path names are sent. This is consistent with the fact that in order to actually use the file names for some meaningful purpose (like the one mentioned in RFC 1867: “the uploaded file might contain references to each other, e.g., a TeX file and its .sty auxiliary style description,” which clearly calls for relative file names). When path names are sent, things get much more complicated, since their specific syntax (and interpretation) is strongly system-specific, and there is even no provision for telling the server what the browser’s file system is. Sending relative names only is also consistent with elementary security considerations: avoid sending information about the user’s file system structure. Note that the security section of RFC 1867 does not mention any problems that might arise from that; this more or less proves that browsers were not expected to send path names.
The idea of including a filename
attribute
makes sense of course, and would apply e.g. to a file
submission containing a set of HTML documents referring to each other
with relative URLs.
However, it’s clear that the processing script
would need to strip off the path part of the names (which is in
principle risky since
C:\mydocs\foo.txt
could be a relative filename
on many systems!). Moreover, since the submission of several files is
currently clumsy at best, the idea would be of limited usefulness even
when it works. (Collections of files that refer to each other by names
would be best handled as packaged into formats such as
application/zip
, leaving the file name issue to be handled
by zipping and unzipping programs, which can preserve relative names as
well as relative directory structures.)
size
attributeAlthough the user is not expected to type the filename(s) into a filename box but use the Browse function, the size of the box matters. When the user selects a file by clicking on it, the browser puts the filename into the filename box, and the name is a full pathname which can be quite long. It may confuse users if they see the name badly truncated.
Definition of
input type="file"
in the HTML 3.2 specification
said:
Just like [for]type=text
you can use thesize
attribute to set the visible width of this field in average character widths.
And most browsers seem to treat the size
attribute
that way.
But the HTML 4.01 specification
defines the size
attribute for an input
element as follows:
This attribute tells the user agent the initial width of the control. The width is given in pixels except whentype
attribute has the value"text"
or"password"
. In that case, its value refers to the (integer) number of characters.
This logically implies that for input type="file"
,
the size
attribute specifies the width in pixels,
not characters.
This is probably an oversight, and
the risk of a browser acting literally according it
is ignorable.
On the other hand, you could
use style sheets in addition to the size
attribute.
Using e.g. the attribute
style="width:25em"
could override the size
attribute; this currently seems to happen
on IE 4 and newer only, but it should do no harm on browsers which don’t
support it.
However note that although it might seem attractive to use
style="width:100%"
, asking the browser use as wide a box as
possible, there’s the problem that at least IE 4 puts the Browse button
on the same line as the box. Thus you would in effect force horizontal
scrolling! Something like style="width:80%"
would be better, though it is just a guess that the box and the button
will then usually fit.
Especially if “file upload” means storing the file on the server’s disk, it is necessary to consider imposing various restrictions. It would be nasty if some user filled the disk with gigabytes of junk, by ignorance, or by misclicking, or by malevolence. See section Avoiding Denial of Service Attacks in the documentation of CGI.pm; even if it isn’t directly applicable to you since you use other techniques than CGI and Perl, it gives some food for thought in general.
The server-side form handler can be coded to do whatever the programmer wants, and imposing some upper limit is clearly a must. (That is, the code should check for the input size, and discard, or otherwise process in a special way, submissions exceeding a reasonable limit.)
Any client-side restrictions, i.e. checks done by a browser prior to form submission, are unreliable and should be considered as extra comfort to users only—so that they get a rejection message earlier.
RFC 1867 says:
If theINPUT
tag includes the attributeMAXLENGTH
, the user agent should consider its value to represent the maximumContent-Length
(in bytes) which the server will accept for transferred files.
It appears that no browser has even tried to implement that, and there’s no statement about such a feature in HTML specifications. On the contrary, the HTML 3.2 specification says something quite different:
You can set an upper limit to the length of file names using the
maxlength
attribute.
Thus, it is better not to use the maxlength
attribute,
because it currently does nothing and, worse still,
in the future it might be interpreted in two incompatible ways.
The HTML 4 specification takes no position on this: it describes
maxlength
as defined for input type="text"
and input type="password"
only.
The HTML 4.01 specification defines an accept
attribute
for use with input type="file"
as follows:
This attribute specifies a comma-separated list of content types that a server processing this form will handle correctly. User agents may use this information to filter out non-conforming files when prompting a user to select files to be sent to the server.
Thus you could specify, for example,
accept="image/gif,image/jpeg"
, if you are willing to get
image files in GIF or JPEG format only.
Browsers might use this information to set up the Browse menu
so that only such files are selectable, at least initially.
And
the HTML 3.2 specification even claims:
“Some user agents support the ability to restrict the kinds of files
to those matching a comma separated list of MIME content types given
with the ACCEPT
attribute[;]
e.g. accept="image/*"
restricts files to images.”
(Note that "image/*"
is not a MIME content type. Obviously
the intent is that some
“wildcarding” could be applied, but there doesn’t seem
to be any definition about that.)
But it seems that browser support is currently nonexistent.
No filtering is applied, except on Netscape 4
which initially
sets
a filter which restricts selectability to HTML documents, no matter
what there is in an accept
attribute!
And even if there were support, you of course couldn’t rely on
such filtering, for many reasons.
If it worked, it would be basically for user comfort, not for setting
effective restrictions (which must be imposed by the form handler).
Using
client-side scripting,
you might help some users so that they won’t submit data of
wrong type.
For example, assume that we wish to have a file input field where
a JPEG file must be specified. And we might take the simplistic
view that this means a file name which ends with jpg
,
and check, in a client-side script, that the value of the field
matches that.
Note that the value is the filename, not the file content.
However one must be extra careful here.
Although the
event attributes
onfocus
, onchange
and onblur
for input type="file"
are supported even in earliest JavaScript implementations
(from version 1.0), there are limitations and problems.
In particular, onblur
seems to be treated strangely,
and the obvious idea—associate checking code with
onblur
—seems to make Netscape run in an eternal
loop. Thus, it is probably best to
associate the checks with file submission only.
This means using the onsubmit
attribute in the
form
tag.
Example:
<script type="text/javascript" language="JavaScript">
function check() {
var ext = document.f.pic.value;
ext = ext.substring(ext.length-3,ext.length);
ext = ext.toLowerCase();
if(ext != 'jpg') {
alert('You selected a .'+ext+
' file; please select a .jpg file instead!');
return false; }
else
return true; }
</script>
<form method="post" name=f
enctype="multipart/form-data"
onsubmit="return check();"
action="http://jkorpela.fi/cgi-bin/echo.cgi">
<p>
Please select a JPEG (.jpg) file to be sent:
<br>
<input type="file" name="pic" size="40"
accept="image/jpeg">
<p>
Please include a short explanation:<br>
<textarea name="expl" rows="3" cols="40"
onfocus="check();">
</textarea>
<p>
<input type="submit" value="Send">
</form>
The status of the original description of
input type="file"
, namely
RFC 1867,
Form-based File Upload in HTML, is vague.
The HTML 4.01 specification
makes only an
informative reference
to it, and mentions a “work in progress” in this area:
ftp://ftp.ietf.org/internet-drafts/draft-masinter-form-data-01.txt
This is however outdated information; the URL does not work, and
the draft has expired.
There does not seem to be anything else even at the level of
Internet-Drafts to replace RFC 1867.
There is however
RFC 2388,
Returning Values from Forms: multipart/form-data
which might be related to the process. However it is not
specified to obsolete RFC 1867.
In the HTML 4.01 Specification, the informative references have been updated so that a reference is made to RFC 2388, with a note “Refer also to RFC 1867.”
In June 2000, RFC 2854, The 'text/html' Media Type, was issued. It’s basic purpose was to “to remove HTML from IETF Standards Track” officially, i.e. to make it explicit that work on HTML specifications has been moved from IETF to W3C. It explicitly obsoletes RFC 1867, together with some other HTML related RFCs. But note that there is very little in HTML specifications by the W3C that defines what file input really is; they refer to RFC 1867 instead.
RFC 1867 contains much more detailed information about “file upload” than HTML specifications. It explains the original idea and how it might be implemented. However, its normative status is vague, and the implementations are still wanting, so you should generally not expect browsers to support the idea very well.