This tutorial covers the basics of the World Wide Web, focusing on its
technical aspects. After all, the Web is a technological phenomenon.
Therefore it's useful to understand some of the fundamentals of how it
works.
The
world wide web is a system of Internet
servers that supports hypertext and multimedia to access several
Internet protocols on a single interface. The World Wide Web is often
abbreviated as the
web or
www.
The World Wide Web was developed in 1989 by Tim Berners-Lee of the
European Particle Physics Lab (CERN) in Switzerland. The initial purpose
of the Web was to use networked hypertext to facilitate communication
among its members, who were located in several countries. Word was soon
spread beyond CERN, and a rapid growth in the number of both developers
and users ensued. In addition to hypertext, the Web began to incorporate
graphics, video, and sound. The use of the Web has reached global
proportions and has become a defining element of human culture in an
amazingly short period of time.
In order for the Web to be accessible to anyone, certain agreed-upon
standards must be followed in the creation and delivery of its content.
An organization leading the efforts to standardize the Web is the World
Wide Web (W3C) Consortium. Take a look at the
W3C Consortium Web site to get an idea of its activities. A lot of the material is technical because, after all, the Web is a technical phenomenon.
Protocols of the Web
The surface simplicity of the Web comes from the fact that many individual protocols can be contained within a single Web site.
internet protocols
are sets of rules that allow for intermachine communication on the
Internet. These are a few of the protocols you can experience on the
Web:
HTTP (HyperText Transfer Protocol): transmits hyptertext over networks. This is the protocol of the Web.
E-mail (Simple Mail Transport Protocol or SMTP): distributes e-mail messages and attached files to one or more electronic mailboxes.
FTP (File Transfer Protocol): transfers files between an FTP server and a computer, for example, to download software.
VoIP (Voice over Internet Protocol): allows delivery of voice communications over IP networks, for example, phone calls.
The Web provides a single, graphical interface for accessing these and
other protocols. This creates a convenient and user-friendly
environment. Once upon a time, it was necessary to know how to use
protocols within separate, command-level environments. This meant you
needed to know the text commands and type them out to make things
happen. The Web is much easier, since it gathers these protocols
together into a unified graphical system. Because of this feature, and
because of the Web's ability to work with multimedia and advanced
programming languages, the Web is by far the most popular component of
the Internet.
Hypertext and links: the motion of the Web
The operation of the Web relies primarily on
hypertext
as its means of information retrieval. HyperText is a document
containing words that connect to other documents. These words are called
links and are selectable by the user. A single hypertext document can
contain links to many documents. In the context of the Web, words or
graphics may serve as links to other documents, images, video, and
sound. Links may or may not follow a logical path, as each connection is
created by the author of the source document. Overall, the Web contains
a complex virtual web of connections among a vast number of documents,
images, videos, and sounds.
Producing hypertext for the Web is accomplished by creating documents with a language called
hypertext markup language, or
html.
With HTML, tags are placed within the text to accomplish document
formatting, visual features such as font size, italics and bold, and the
creation of hypertext links.
<p> This is a paragraph that shows the underlying HTML code. <strong>This sentence is rendered in bold text</strong>. <em>This sentence is rendered in italic text.</em> </p>
HTML is an evolving language, with new tags being added as each upgrade
of the language is developed and released. Nowadays, design features are
often separated from the content of the HTML page and placed into
cascading style sheets (css).
This practice has several advantages, including the fact that an
external style sheet can centrally control the design of multiple pages.
The World Wide Web Consortium (W3C), led by Web founder Tim
Berners-Lee, coordinates the efforts of standardizing HTML. The W3C now
calls the language XHTML and considers it to be an application of the
XML language standard.
Pages on the Web
The backbone of the World Wide Web are its files, called pages or Web
pages, containing information and links to resources - both text and
multimedia - throughout the Internet.
Web pages can be created by user activity. For example, if you visit a
Web search engine and enter keywords on the topic of your choice, a page
will be created containing the results of your search. In fact, a
growing amount of information found on the Web today is served from
databases, creating temporary Web pages "on the fly" in response to user
searches. You can see an example of such a page below, taken from the
search engine
Hakia. This page only exists as a result of a search.
Access to Web pages can be accomplished in all sorts of ways, including:
- Entering a Web address into your browser and retrieving a page directly
- Browsing through sites and selecting links to move from one page to another both within and beyond the site
- Doing a search on a search engine to retrieve pages on the topic of your choice (See: The World of Search Engines)
- Searching through directories containing links to organized collections of Web pages (See: The World of Subject Directories)
- Clicking on links within e-mail messages
- Using apps on social networking sites or your mobile phone to access Web and other online content
- Retrieving updates via RSS feeds and clicking on links within these feeds (See: RSS Basics)
Retrieving files on the Web: the URL and Domain Name System
url stands for
uniform resource locator. The URL specifies the Internet address of a file stored on a host computer, or
server,
connected to the Internet. Web browsers use the URL to retrieve the
file from the server. This file is downloaded to the user's computer, or
client, and displayed on the monitor
connected to the machine. Because of this relationship between clients
and servers, the Web is a
client-server network.
Underlying the functionality of a URL is a base numeric address that
points to the computer that hosts the file. This numeric address is
called the
ip (internet protocol) address. The host portion of a URL is translated into its corresponding IP address using the
domain name system (dns).
The DNS is a worldwide system of servers that stores location pointers
to the computers that host networked files. Since numeric strings are
difficult for humans to use, alphneumeric addresses are employed by
users. Once the translation is made by the DNS, the browser can contact
the server and ask for the specific file designated in the URL.
For example, the DNS translates
www.microsoft.com into the IP address
207.46.19.254.
Anatomy of a URL
Every file on the Internet, no matter what its protocol, has a unique
URL. Each URL points to a specific file located in a specific directory
on the host machine. This is the format of a URL:
protocol://host/path/filename
For example, this is a URL from the site of the U.S. Senate of a live video stream sent by a camera pointed at the U.S. Capitol:
http://www.senate.gov/general/capcam.htm
This URL is typical of addresses hosted in domains in the United States. The structure of this URL is shown below.
- Protocol: http
- Host computer name: www
- Second-level domain name: senate
- Top-level domain name: gov
- Directory name: general
- File name: capcam.htm
Note how much information about the content of the file is present in this well-constructed URL.
Several generic top-level domains (gTLDs) are common in the United States:
com | commercial enterprise |
edu | educational institution |
gov | U.S. government entity |
mil | U.S. military entity |
net | network access provder |
org | usually nonprofit organizations |
In addition, dozens of domain names have been assigned to identify and
locate files stored on servers in countries around the world. These are
referred to as
country codes, and have been standardized by the International Standards Organization as ISO 3166. For example:
ch | Switzerland |
de | Germany |
jp | Japan |
uk | United Kingdom |
Additional top-level domain names were approved in 2000 by the Internet
Corporation for Assigned Names and Numbers (ICANN): .biz, .museum,
.info, .pro (for professionals) .name (for individuals), .aero (for the
aerospace industry), and .coop (for cooperatives). Unconventional domain
names have been marketed outside of the system, for example, .tv for
sites that offer content similar to television broadcasts. In 2011,
ICANN decided to open up domain names without restriction, including in
any language or written script. The cost of establishing and maintaining
a new name is quite expensive - $185,000 for the application fee alone -
so the actual effect of this change will be limited.
As the technology of the Web evolves, URLs have become more complex.
This is especially the case when content is retrieved from databases and
served onto Web pages. The resulting URLs can have a variety of
elaborate structures, for example,
http://spills.incidentnews.gov/incidentnews/FMPro?-db=images&-Format=maps.htm
&SpillLink=8&Subject=Waterway%20Closure%20Map&-SortField=EntryDate&
-SortOrder=descend&-SortField=EntryTime&-SortOrder=descend&-Token=8&
-Max=20&-Find
The first part of this URL looks familiar. What follows are search
elements that query the database and determine the order of the results.
As a growing number of databases serve content to the Web, these types
of URLs are appearing more commonly in your browser's address window.
Programming languages and environments
The use of programming languages beyond HTML extend the capabilities of
the Web. They are used to write software, process Web forms, fetch and
display data, and perform all kinds of advanced functions. It is
difficult to talk about these languages without getting into too much
technical jargon, but here is an attempt. What follows is a brief guide
to some of the more common languages in use on the Web today.
CGI (Common Gateway Interface) refers to a
specification by which programs can communicate with a Web server. A CGI
program, or script, is any program designed to process data that
conforms to the CGI specification. The program can be written in any
programming language, including C, Perl, and Visual Basic Script
(VBScript). In the early days of the Web, CGI scripts were commonly used
to process a form on a Web page. Perl is popular with Google, and is
also the language of the
Movable Type blog platform.
Active Server Pages (ASP): Developed by Microsoft, ASP
is a programming environment that processes scripts on a Web server. The
programming language VBScript is often used for the scripting.
Lightweight programs can be written with this language. Active Server
Pages end in the file extension .asp. For an example, check out
Databases and Indexes at the University at Albany Libraries.
.NET framework: Also developed by Microsoft, this
development framework is a more powerful one than ASP for writing
applications for the Web. Programming languages include C+ and VB.Net.
ASP.Net is a related environment, producing pages with the file
extensions .aspx. The
Microsoft site is a good example of a site created with the .NET framework.
PHP: This is another server-based language. It is
frequently the language used to write open source (e.g., nonprofit,
community-created) programs found on the Web, including
MediaWiki (the software that runs the
Wikipedia), and the popular blog software
WordPress.
While PHP functionality can be installed on Windows servers, it is
native to the Linux server environment and commonly used there.
Java/Java Applets: Java is a programming language
similar to C++. Developed by Sun Microsystems, the aim of Java is to
create programs that will be platform independent. The Java motto is,
"Write once, run anywhere." A perfect Java program should work equally
well on a Windows, Apple, Unix, or Linux server, and so on, without any
additional programming. This goal has yet to be realized. Java can be
used to write applications for both Web and non-Web use.
Web-based Java applications are usually in the form of
Java servlets.
These are small Java programs fetched from within a Web page that can
be downloaded from a server and run on a Java-compatible Web browser. A
Web page that links to a Java servlet has the file extension .jsp.
JavaScript is a very popular programming language
created by Netscape Communications. Small programs written in this
language are embedded within a Web page, or fetched externally from
within the page, to enhance the page's functionality. Examples of
JavaScript include drop-down menus, image displays, and mouse-over
interactions. The drop-down menus on the site of the UCLA Library shown
below are a good example: when you hover your mouse over the menu item, a
set of sub-menus opens up below.
XML: XML (eXtensible Markup Language) is a mark-up
language that enables Web designers to create customized tags to provide
functionality not available with HTML alone. XML is a language of data
structure and exchange, and allows developers to separate form from
content. With XML, the same content can be formatted for multiple
applications. In May 1999, the W3 Consortium announced that HTML 4.0 has
been recast as an XML application called XHTML.
AJAX stands for Asynchronous JavaScript and XML. This
langauge is used to create interactive Web applications. Its premise is
that it sends data to the browser behind the scenes, so that when it is
time to view the information, it is already "there." Google Maps is a well-known example of AJAX.
A different kind of example can be found with SurfWax LookAhead, an RSS search tool that retrieves feeds as you type your search.
SQL (Structured Query Language): This is a language
that focuses on extracting data from databases. Programmers write
statements called queries that retrieve data from the tables in the
database. Some Web sites are created extensively or entirely from data
stored in database tables. You can often tell that a SQL query has
produced data on a page by the presence of a question mark (?) and a
record number in the URL, as the example below illustrates.
Mashups
Programs on the Web can be flexible. Sometimes they are combined with
each other to form ehanced presentations. These are known as
mashups.
A mashup is a Web application or Web page that combines data from two or
more external sources. Mashups give you access in one place to
information available in multiple places.
There are all kinds of mashups on the Web. One example is Earthquakes In The Last week, a mashup derived from data from the U.S. Geological Survey along with Google Maps. Another is Mashpedia, a mashup of the Wikipedia encylopedia along with current information gathered from the social Web.
Last but not least: Applications (apps)
Applications, commonly called
apps, are
small programs that run within various online environments. These
programs allow you to enjoy functionalities that enhance your experience
within that environment.
Social networking sites often make use of apps. For example, Facebook is
well-known for featuring thousands of apps created by Facebook or
outside developers. These apps allow you to play games, shop, form
issues-based communities, find family or classmates, etc.
Mobile phones are another environment within which apps are both popular
and useful. In fact, no decent mobile phone these days comes without
the option to add apps. A good example is the iPhone, which offers hundreds of thousands of apps in all sorts of areas, from work and education to travel, lifestyle, entertainment, and so on. Also take a look at the Android Market site to browse the apps available for the Droid phone. It is safe to say that apps make the mobile phone what it is today.
Apps are a very fast-growing area of the networked experience. Some
observers believe that apps will be a focus of developments in the
online world in the coming years.