
This illustration shows each of the four main elements of a URL: protocol, Domain Name, File Path, and File Name.
Click here to go right to a description of Backtracking, and why one would use this technique.
PROTOCOL
The first element in any URL is the Protocol. In the case of web pages,
the protocol will almost always be "http", which stands for "HyperText
Transfer Protocol." Basically, this tells your browser that it will be
loading a web page. As mentioned earlier, web browsers can use other protocols
to access other kinds of information on the Internet. More specifically,
in addition to "http", this protocol element of the URL could also read
"gopher" or "ftp". Most modern browsers assume that users are looking for
web pages, so the protocol is optional. If the user simply types in www.abacon.com,
for example, the browser will automatically fill in the "http://" section
of the URL.
One final word about protocol: most new web browsers can also
be used to browse documents on a user's computer. Instead of typing in
a web page address, the user might type in something like "c:\". This would
direct the web browser to display the contents of the computers "C" drive.
The actual URL would look like this: " file:///c|/". We can
add "file" as another protocol that web browsers can use; Microsoft Windows
98 makes use of this feature of web browsers by enabling users to perform
all navigation on their computer through a web page-like interface.
SERVER DOMAIN NAME
The second element to every URL is the server domain name, which
is like the street address of the web server. Basically, the domain name
tells the browser where it can find the web page in question, and in theory,
the domain name reads similar to a street address, from most specific to
most general. In the above example, the domain name consists of three parts:
"www", "niu", and "edu". Going in reverse, we see that "edu" tells us the
web page in question is associated with an educational institution. Other
domain types include "com" (a commercial site), "org" (a non-profit organization),
"gov" (a governmental site), "net" (a local network), or might be a country
code like "us" (United States), "uk" (England), "jp" (Japan), or "ca" (Canada).
The next part, "niu" specifies which educational institution we'll be looking
at: in this case, Northern Illinois University. The final part, "www,"
tells us what kind of server we'll be accessing: in this case, a web server
(slightly redundant given the definition of the "http" protocol, but standard).
FILE PATH
The third element included in a URL is the file path. This element
tells the browser where on the server to look for the requested web page.
In the example above, the file path specifies "english", so the web browser
will look on the server for a folder called "english." File paths
can include nested folders as well. For example, consider the following
URL: http://www.niu.edu/english/classes/ceh/main.html. In this example,
the file path specifies several layers of folders. First, the browser will
look for a folder called "english." Assuming it finds that folder, it will
look for a folder called "classes" within the "english" folder; then it
will look for "ceh" inside "classes."
FILE NAME
The final element to a URL is the actual file name of the web page
in question. In the example above, the file name of the web page we are
looking for is "english_home.html". Note that most web pages will end in
".htm" or ".html".
One special exception to this final URL element concerns servers that
use "default documents." For example, if a user were to type in http://www.niu.edu/english/,
one of three things would happen. First, they might get an error that told
them the page they requested couldn't be found. This would happen because
the user forgot to enter the file name. Second, the user might go to a
default page. In this case, the web server knows that if users do not enter
a file name, the browser should automatically look for a file called "index.html"
and go there. This "index.html" file is a normal web page, so could look
like anything at all. Third, the user might receive a list of all the files
currently in that folder (the folder called "english" in this case). This
is called "directory browsing", and is basically the same thing that a
user does when he or she looks at an index of the files in their floppy
disk. Some servers do not allow directory browsing, but some do. Thus,
if a user does not enter a file name, the browser will probably look for
a file called "index.html". If it finds one, that page will be loaded.
If it does not, the browser will find out if it can list the index of the
folder. If it can, it will. If it can't, then the browser will give the
user an error message (probably to the effect that directory browsing is
not allowed, or permission denied).
To sum up how URL's work, let’s take another look at our sample URL:
http://www.niu.edu/english/english_home.html. The protocol tells the browser
that it should look for a web page. The domain name tells the browser that
it should look for a web server at an educational institution called NIU.
The file path tells the browser to look for a folder called "english" on
the web server. The file name tells the browser which page in the
"english" folder it should copy and display for the user. That's all clear
now, right?
Let's take an example:
http://www.engl.niu.edu/ceh/104/
This process is useful for discovering the relationship between any particular document and the nature of the heirarchical structure of the PATH from the server root (the page you get when looking at he Domain Name) to the page in question.