textarea
textarea
elementsEspecially in applications where data is entered to a database or stored onto disk on the server, some limitations must be imposed on the amount of data. It would be nasty if a database crashed or a disk got filled with terabytes of data, sent by some user out of ignorance, mistake, or malevolence.
But in HTML, there is no way limit the number
of characters entered by the user in a
textarea
element.
Browsers may impose some limitations,
but they are a problem, not a solution.
The server-side script that
handles the form submission needs to check against excessive amount
of input.
Client-side scripting in JavaScript can be used for
auxiliary checks in order to give the user faster
feedback when he tries to exceed a limit. This document describes
briefly both
simple JavaScript checking on form submission and
more real-time checking
based on counting characters
as they are typed.
rows
and cols
attributesIn an HTML form,
a textarea
element
specifies a text input area. The author can, and indeed must,
suggest the visible size of the area, using the
rows
and cols
attributes.
But these should not be taken as limitations to the
amount of data. The HTML specifications have explicitly said that
from the very beginning; the
HTML 2.0 specification said:
"HTML user agents should allow text to extend beyond these limits by scrolling as needed",
and the current specification,
HTML 4.01, repeats this
more verbosely:
- rows = number [CN]
- This attribute specifies the number of visible text lines. Users should be able to enter more lines than this, so user agents should provide some means to scroll through the contents of the control when the contents extend beyond the visible area.
- cols = number [CN]
- This attribute specifies the visible width in average character widths. Users should be able to enter longer lines than this, so user agents should provide some means to scroll through the contents of the control when the contents extend beyond the visible area. User agents may wrap visible text lines to keep long lines visible without the need for scrolling.
All browsers seem to allow the input of an unlimited number of lines in a textarea, except that there can be a browser-specific limit on the total number of characters.
However, several browsers (e.g. Internet Explorer and Opera) violate the above-cited principle on line length. Instead of allowing users to scroll horizontally, they "soft wrap": when line length would exceed the visible width, they visually display the text in two or more lines. The actual data sent by the browser does not contain line breaks in such positions where the browser has "softly" broken a line; only actual line breaks entered by the user are included into the data. This means confusion: the user sees the text in lines without knowing how it will actually be sent. Sometimes this doesn't matter, sometimes it does.
Some Web authors regard Netscape 4's default
behavior - horizontal scrolling when needed - as a problem.
This typically indicates that the author is trying to use a
textarea
element for something else than for data
input. The Netscape 4 behavior in this respect surely complies with the
specifications. Whether IE behavior is a bug or just poor quality is debatable.
By using the attribute wrap="off"
in the textarea
tag one
can make Internet Explorer behave according to the specifications,
without affecting other browsers. It is of course stupid that we need
to use a nonstandard attribute to make a browser behave in a standard
manner, but what can you do? (Unfortunately, there does not seem
to be any way to make Opera behave in this respect; it ignores the
wrap
attribute.)
Using Cascading Style Sheets (CSS), you can achieve the same effect
with
.
Thus, the wrap
attribute can be regarded as outdated.
The wrap
attribute is a horrendous kludge in other ways too.
It is not in any HTML specifications, but it is recognized by
(sufficiently new versions of) IE and Netscape
as follows:
attribute | wrapping behavior | note |
---|---|---|
wrap="off"
| no wrapping; horizontal scrolling if needed | default on Netscape 4 |
wrap="soft"
| "soft" wrapping: the browser divides the text into lines to make it fit horizontally but does not thereby introduce actual line breaks into the data | default on IE and Netscape 6 (a mistake) |
wrap="hard"
| "hard" wrapping: the browser divides the text into lines to make it fit and thereby introduces actual line breaks into the data | Not supported by Netscape 6. |
Stephanos Piperoglou's generally excellent review
HTML 4.0 in Netscape and Explorer says,
in section
Forms, that IE does not support values
soft
and hard
but recognizes
virtual
and physical
instead.
This does not seem to be correct.
Miko O'Sullivan's detailed
Mikodocs Guide to HTML
says, in the
description of wrap
for textarea
:
"You may from time to time see other variations on
WRAP
, such as VIRTUAL
or
PHYSICAL
.
Netscape introduced these attributes a few years ago as proposed extensions
to HTML 3.0, then abandoned them."
It's very hard to say how different versions of Netscape, on
different platforms, actually behave. My tests with Netscape 4.04 on
WinNT suggest that the default is (correctly) no wrapping,
the value off
just confirms the default,
hard
and soft
work as described above,
but any other value is equivalent to soft
!
Naturally a browser should ignore an attribute setting it does
not recognize, such as wrap="foobar"
, but Netscape
treats it as equivalent to the non-default setting
wrap="soft"
.
Thus, wrap="off"
might
be useful since it overrides the wrong default (wrap="soft"
)
on IE. It is questionable whether the other values should ever
be used.
Although "hard" wrapping might appear to be a way to limit the
line length (to the value specified by the cols
attribute),
this is not a reliable method.
The wrap
attribute is poorly documented,
probably does not have any effect on most other browsers than IE and
Netscape; and it may behave differently even on different versions of
IE and Netscape.
You still need server-side checking (or other
processing of too long lines) if it is essential that lines not
exceed a limit you need to set.
And users might be accustomed to "soft" wrapping (which is,
after all, the default on IE) might easily be lured into thinking that
their text just "soft wraps", without realizing that their lines will
actually be sent as broken.
Wrapping implies some potentially very nasty effects.
When wrapping, either "soft" or "hard", is applied, the browsers
(Netscape, IE, Opera) basically break between "words", i.e.
strings separated by spaces. This is natural and acceptable, assuming
that wrapping is acceptable at all. But if a "word" is longer than the
textarea width, as set by the cols
attribute, the
browsers will split it! You can probably see this below, where we
have a few textareas with prefilled content
"Supercalifragilisticexpialidocious." as one line but
with cols
set to 20. Your browser probably
splits at least some of the strings to two lines.
Such splitting can be disastrous if the user wants to type a URL into a textarea, especially if "hard" wrapping is on, since the URL will actually be split into pieces. Even with "soft" wrapping the user will be confused by the apparent splitting; how can he know whether it will actually be sent that way? (You can hardly expect a normal user to peek at the HTML source and consult references which say what it causes and figure out which reference gets it right.)
As if this weren't enough trouble, IE takes liberties in
splitting words and "words". Just as it splits a word
containing a hyphen after the hyphen when formatting normal data for
display, it will break foo-bar to
foo- and bar
in textarea input
if the first part still fits into the line but the second part won't.
It also breaks after several special characters, which is harmful
for URLs and other strings like foobar%zap especially when
hard wrapping is in effect.
Since Netscape 6 seems to have soft wrapping on by default, and it
does not even honor wrap="off"
,
and since IE 6 and Opera 6 keep wrapping too
should we deduce that
the original intended processing of textareas has been replaced by a
"de facto" browser standard?
Maybe, but that's very unfortunate.
The approach described above has been criticized for not being user-friendly. And there's certainly a point in the note that the "HTML way" of text input does not correspond to the intuitive expectations of people acquainted with text processing programs. A Usenet article by Simon Brooke summarizes this well:
Textareas are for input of larger amounts of text. Sometimes this text necessarily has arbitrarily long lines. Very often it doesn't. Naive users, or users carrying expectations over from other software, become confused and disoriented either when the caret goes out of the viewport, or the viewport scrolls laterally. For these users, the 'valid' form of the textarea widget is a user-hostile control.
The fundamental problem here is that there are two different mental models (and corresponding implementations) of "typing text". The first one, the older one, is based on explicit line breaks entered by the user. It corresponds to typing on a typewriter, and it is common among programmers, and it's also the model on which several Internet protocols (like E-mail and Usenet protocols) are based. The second one, now more common among "ordinary users", was introduced by text processing programs (as opposite to text editors), and it means that the user need not, and normally should not, hit Enter or Return but just watch the program divide the text into lines. - Enter or Return generally means end of paragraph in this model.
Quite some confusion has arisen when the two models, or conventions, have been used in the same environment without any conventions and arrangements for conversions. The confusion is described somewhat more technically at the end of an otherwise all-too-technical Unicode report on newline guidelines.
The HTML specifications are clearly based on the first mental model as regards to textarea
. Quite a few browsers have implemented
textarea
more or less according to the "text processing" model, at least optionally. And it's optional in the sense of being, to some extent, settable by the
author.
This adds confusion to confusion. If the textarea
element were specified more flexibly, it could be a
browser
option whether it works in "typewrite mode" or "text processing mode". Each user could then select the method he is familiar with, or just prefers. But now users have to guess how each textarea
works; you can't see it without trying, or peeking at the HTML markup.
Since both browser behavior and user behavior (i.e., users' understanding on how their input will be handled, by the browser and by the server, if they can tell the difference) varies, you cannot really know much of the intended newlines in input. You can't even know, in general, whether they were entered by the user or inserted by the "friendly" browser. So if it's some text that might logically consist of paragraphs, you can't recognize paragraphs without special conventions. The best workaround is probably an explicit statement "please use an empty line between paragraphs", if you wish to be able to recognize paragraphs, as you probably do quite often, even for a simple guestbook application.
Some browsers impose some limits on the amount of
data that can be entered in a textarea.
Limits like 32 or 64 kilobytes
(32,768 or 65,536 characters)
have been observed. Such limits, if they exist, are caused
by simplistic implementations, and they are independent of
the values of the rows
and cols
attributes.
Such limitations cannot be taken as solutions to the problem of limiting textarea input size. They are just browser-specific limitations (which shouldn't really exist, and will hopefully be removed in new versions). Instead, they constitute a problem.
It is unlikely that a user takes the trouble of typing more than 32,768 characters into a textarea when filling out a form. Browsers' user interfaces for such purposes are generally very poor, with extremely limited editing capabilities. But a user might cut and paste some long text which he has typed using an editor or a text processing program.
If you expect that some users might wish to include very long texts when filling out a form of yours, consider making it possible to use alternative methods of data submission. Depending on the case, this might mean one or more of the following:
support@ourcompany.example
. Please include
the text data as an attachment
(both plain text and MS Word format are accepted),
and please specify the
following information in the message body: your full name - -"
textarea
vs. input type="text"
Basically, a textarea
is for unlimited,
usually multi-line input of text, whereas
input type="text"
is for single-line input.
For input type="text"
,
we can use the size
attribute to
specify the visible size of the field, in characters.
But we can also use the maxlength
attribute to specify
the maximum amount of characters that can be entered.
Browsers generally enforce such a limit. However,
an author should still assume that the limit can be exceeded
and test things server side (as explained below); for reasons to this
"paranoia", see some words of warning
(in How can I make a field readonly
in a Web form?).
Thus, for small amounts of user input,
you could use one or more input type="file"
elements instead of a
textarea
element. If you need to include several
single-line input fields, note that the maxlength
attributes set a separate limit on each field, and note that the
user cannot simply continue typing or press enter after typing
a line and wishing to continue.
(He can use tabbing, though.)
In fact, relatively often
pressing enter in a single-line input field
submits the form! So this approach is a bit problematic.
(There's of course the additional problem that the server-side
script needs to process all the single-line input fields, perhaps
concatenating them into one string, adding line breaks or spaces
between the values.)
The following form lets you test how the idea works on your browser.
Generally, the server-side script that handles a form submission should perform data consistency and acceptability checks on the form data before doing anything else.
At the simplest, the form handler could first just
look at the Content-Length
header in
HTTP headers and discard the submission (politely, perhaps),
if there is no such header or its value is larger than some limit.
But you would still need the code for actually processing the data
when its size is acceptable, and check the amount of data -
since a cracker could have
faked Content-Length: 42
and still
send you megabytes of junk.
In such checks, checking for the amount of data entered in a textarea is usually rather simple. At the simplest, you just get the length of the data in characters and compare it against a limit. The implementation depends on the server-side interface technology (CGI, ASP, something else) and on the programming or scripting language used (Perl, C, C++, sh, whatever).
As a very simple illustration, consider the following form:
<form action= "http://jkorpela.fi/cgi-bin/chkarea.pl" method="post"> Please enter data, at most 42 characters:<br> <textarea name="box" rows="5" cols="30"> </textarea> <br><input type="submit"> </form> |
The script that handles the submission just
checks the amount of data corresponding to the textarea.
In a real-life situation, this would be preliminary to
any further processing. In a CGI script written in Perl,
using the CGI.pm
module, the code is essentially the following:
if(!defined($query->param('box'))) { print "No data included under the expected name - submission rejected.\n"; } elsif(length($query->param('box')) > $limit) { print "Too much data!\n"; } else { print "The data was virtually accepted."; }
The following form is identical to the one above except
for the action
attribute, which here points to
a script which sends back a copy of the form to be fixed and
resubmitted, if there is too much data:
The CGI.pm
module
contains handy tools for creating
such forms which contain, as prefilled data, user input from a
previous form submission, optionally after some editing.
Note: A line break in a textarea counts as two characters. The reason is that it is presented, in the data, as two control codes ("control characters"), namely carriage return (CR) and linefeed (LF). Reference: HTML 4.01 Specification, section Form content types.
The statement above applies to "hard returns" which are actually sent by the browser as part of the data, as opposite to eventual "soft returns", i.e. browsers just visually displaying the data to the user. See Implementations, especially wrapping above.
It is possible to write simple (or complicated) client-side scripting code in JavaScript in order to help the user to stay within the given limits. This could be based on checking the amount of data (entered in a textarea) when the user is about to submit the form, or to move to the next field in the form, or even "real-time" as he types the text.
Since one cannot rely on JavaScript being enabled, the client-side checks should be regarded as extra convenience only, to those users who can and wish to make use of it.
At the simplest, you could use just an onsubmit
attribute in the form
tag, containing JavaScript
code like the following (for our sample form discussed above,
with name="ourform"
attached to it):
onsubmit = "return ok(42);"with
ok()
defined as
function ok(maxchars) { if(document.ourform.box.value.length > maxchars) { alert('Too much data in the text box! Please remove '+ (document.ourform.box.value.length - maxchars)+ ' characters'); return false; } else return true; }
The return false
statement means that normal form
submission does not take place. In practice, you would probably
want to make that code a function
The following sample form uses this technique, so you can test it if you have JavaScript enabled:
It would be possible to perform such checks when the
user leaves a textarea e.g. by tabbing
to the next field or clicking on another field. One could
use the onblur
attribute then, or onfocus
attributes for other fields.
Such "intermediate" solutions (as opposite to checking on submit
or checking while typing) could be especially useful when there are
several textarea fields in the form and you wish to try
to give immediate feedback when a limit is about to be
exceeded. The following JavaScript code, to be used
e.g. in
<textarea name="box_name" onchange="maxlength('box_name', 42)" ...>
and assuming the form is named pooh
,
was suggested by Oliver Tickell:
function maxlength(element, maxvalue) { var q = eval("document.pooh."+element+".value.length"); var r = q - maxvalue; var msg = "Sorry, you have input "+q+" characters into the "+ "text area box you just completed. It can return no more than "+ maxvalue+" characters to be processed. Please abbreviate "+ "your text by at least "+r+" characters"; if (q > maxvalue) alert(msg); }
Here's the code in action (note that with JavaScript enabled, you'll have the input volume checked as soon as you leave the textarea field e.g. by tabbing):
In order to count characters as the user types them, we need JavaScript 1.2 features, so this enhancement won't work on all JavaScript-enabled browsers. (See Events and Event Handlers by Martin Webb for information on support to event handlers in different JavaScript implementations.)
Our approach is to use the onkeyup
attribute
for the textarea
element
and associate some checking code with it. Specifically, the code
updates a text field displaying the
length of the value of the textarea field, i.e.
amount of characters entered. It also checks that
value against the given limit. There are different things that
we could do when the limit is exceeded. One approach is to
display an alert message. We can also add code that changes
the counter display to red and bold, though the features needed
for this currently work on IE 4+ only; but it's only an
additional hint. This way, the user can continue typing and
later delete something from the area to get below the limit.
Sample code for the routine to be invoked via the onkeyup
attribute is:
function update() { var old = document.f.counter.value; document.f.counter.value=document.f.box.value.length; if(document.f.counter.value > limit && old <= limit) { alert('Too much data in the text box!'); if(document.styleSheets) { document.f.counter.style.fontWeight = 'bold'; document.f.counter.style.color = '#ff0000'; } } else if(document.f.counter.value <= limit && old > limit && document.styleSheets ) { document.f.counter.style.fontWeight = 'normal'; document.f.counter.style.color = '#000000'; } }
The following form uses this technique;
for testing purposes, the textarea size limit is set to
a ridiculously small value (eight characters).
It has been
written
(using a
nonscript
element)
so that when JavaScript is not enabled at all,
a message explaining what could be achieved by using
a JavaScript-enabled browser:
(If you used a JavaScript enabled browser, preferably supporting JavaScript version 1.2 or equivalent (as supported e.g. by Internet Explorer 4 and Netscape Navigator 4), you would have some help from the browser in trying to remain within the limit.)And for JavaScript-enabled browsers not supporting JavaScript 1.2 we have just the simple checking on submit in operation. To avoid confusing users, we won't include the counter field when it doesn't work. This is achieved by generating the markup for it dynamically, with code that should get executed by JavaScript 1.2 capable browsers only. Unfortunately the method for this, the inclusion of
language="JavaScript1.2"
into the
script
element, does not seem to work on Opera,
i.e. Opera users with JavaScript enabled will see a counter
field which doesn't work.
Note that the onkeyup
attribute will not
capture the event of cutting and pasting text in the textarea
using a mouse.
This means e.g. that if the user pastes an excessively long
piece of text into the textarea,
the counter displays a wrong value until the user hits a
key when focus is in the textarea. This is hopefully tolerable.
If the user tries to submit the form, the onsubmit
test
will inform him about the problem, and as soon as he starts
editing the area, the counter helps him to get below the limit.