Python Text, HTML, and Style Sheets


Topics

  1. Python Strings
  2. Python Lists
  3. Reading and Writing to a Text File in Python
  4. Basic HTML
  5. References
  6. Exercise

If you would like the code examples used in this lab, you can click here


1. Python Strings

A string in Python is a sequence of character. It is stored in contiguous memory as an array. We will take a look at how to create a string and access individual characters.

1.1 Different Quotes

Strings are defined as characters inside quote marks. There are three quote marks used in Python:

  1. single quotes
  2. double quotes
  3. triple quotes

You can even nest quotes. For instance, if you start a string with a single quote, you can use as many double quotes inside the string as you want because Python will wait for a single quote to end. The following code creates two strings with nested quotes.

def nestString():
  doubleOut="Let's try to nest a 'single' quote in here"
  singleOut='Let us try to nest a "double" quote in here'
  print doubleOut
  print singleOut

So, triple quotes are something new. The application of the triple quotes is if you want to include new lines, spaces, and tabs into the string. See the following example which includes several new line characters:

def tripleString():
  print """Notice how we can type
  on different lines and Python will
  accept it as one string.
  
  Pretty cool, eh?"""

1.2 Accessing Each Character

Strings are stored as arrays of characters as demonstrated in the diagram below:

Hello Array

You can access individual characters using the following syntax.

def charString():
  for i in "Hello":
    print i

Or, if you would like to move through the string using indexing into the elements, you can do that will the following code.

def arrayString():
hello="Hello"
for i in range(len(hello)):
print hello[i]
print hello[5] #Error is produced!

Note that the indices are from 0 to 4, produced using the range() and len() functions. len(hello) returns 5 and range(5) makes a list from 0 to 4. On each iteration, the i travels through the list. The last statement is thrown in to show that an error is produced if you try and access element 5.

On the low level, each character (in this JES environment) is encoded in Unicode, which uses two bytes. By contrast, ASCII uses one byte. Why would you want to use Unicode?

If you would like to look at all the possible Unicode values: http://www.unicode.org/charts. The characters that represent the English language are under the Latin link. For instance, the Unicode value for 'H' is 48 in hexadecimal, which translates to 4x16+8=72 in decimal.

The following code prints out the decimal and hexadecimal value for each character in a string that is passed as an argument.

def unicodeString(var):
 for char in var:
   print ord(char),"Or in hex: ", hex(ord(char))

Remember, to the computer, each character is just a series of 1's and 0's that are interpreted as text, but because this class is about "multimedia", we could interpret those 1's and 0's as a picture or sound. No guarantee on the quality, though! :)


2. Python Lists

Lists are a powerful structure that can contain just about anything. The next sections focus on how to create a list and randomly generate an element from a list.

2.1 Creating Lists

Lists are created by putting elements in square brackets. The code to generate a list and add to it is shown below.

def createList():
  myList = ["This", "is", "the", "number", 12]
  print myList
  myList = myList + ["and","a", ["sub","list"]]
  print myList
  print myList[4]
  print myList[0]
  print myList[7]
  print myList[7][0]

Notice how you can:

How do you think that you would access an individual character within one of the words? For instance, try accessing the 'i' in "This".

2.2 Randomly Generating Sentences

If you have a list, you can randomly pick one word using a function random.choice(). The following code randomly creates a sentence using one random noun, verb, and phrase from lists.

import random

#program 97 on page 264 
#of Computing and Programming in Python, a Multimedia Approach
def sentence():
  nouns = ["Mark", "Adam", "Angela", "Larry", "Jose", "Matt", "Jim"]
  verbs = ["runs", "skips", "sings", "leaps", "jumps", "climbs", "argues", "giggles"]
  phrases = ["in a tree", "over a log", "very loudly", "around the bush", 
  "while reading the newspaper", "very badly","while skipping", "instead of grading", 
  "while typing on the CoWeb"]
  print random.choice(nouns), random.choice(verbs), random.choice(phrases)

Note that before you can use random.choice(), you must import random, the module that contains random functions.


3. Reading and Writing to a Text File in Python

There might be times when you want to read and write text to a file. When you open a file, you have to specify whether you will be reading or writing text or binary:

The code below opens a file for writing, closes it, opens the file for reading, and then displays the contents.

def openFile():
  file=open("/Users/yourusername/Desktop/PythonText/new.html", "wt")
  file.write("""And here is some content for this file of yours
  maybe it will work and maybe it won't but
  we will try""")
  file.close()
  file=open("/Users/yourusername/Desktop/PythonText/new.html","rt")
  print file.read()
  file.close()

Notice that file.read() reads the entire contents of the file, file.write() writes everything between the triple quotes to the file. Instead of the single write statement, you could have a series of file.write() calls. Remember to use file.close() when you are finished reading or writing.


4. Basic HTML

Text is just a series of ones and zeros that are interpreted as characters (using ASCII or Unicode). You might be asking, "how do you get different font or style". Word processors add that information in the form of style runs.

"A style run is a separate representation of the font and style information with indices into the string to show where the changes should take place. For example, The old brown fox runs might be encoded as[[bold 0 6][italics 8 12]]"

(from page 253 of Computing and Programming in Python, a Multimedia Approach, by Mark J. Guzdial and Barbara Ericson)

For webpages, HTML tags and Cascading Style Sheets are used to determine the font and style. The next sections will focus on the basic concepts of HTML and Cascading Style Sheets and how we can use Python to generate both.

4.1 Basic Tags

Underlying webpages, are tags which identify how the browser will interpret the text. All tags are between angle brackets (< >). The general rule is that every opening tag has a corresponding closing tag, where the closing has a slash in front of it (/).

For instance, to indicate something is a header (large, bold font), you would start the text with an opening <h1> and end it with a </h1>. The following is a very short list of tags that you will see in this lab (modified from the CS215 lab notes):

Tag Description
Document Tags
<html> . . . </html> Indicates the start and end of the HTML file.
<head> . . . </head> Indicates the start and end of the file header.
<title> . . . </title> Displays a title line in the browser's title bar.
<body> . . . </body> Indicates the beginning and end of the body
Format Tags
<h1> . . . </h1> Specifies a large heading. Also available are h2, h3, h4, h5, and h6 (progressively smaller)
<p> . . . </p> Indicates a new paragraph.
<br   /> Indicates new line i.e. start the text at the beginning of the next line.
Note that there is nothing inside this tag and it is the opening a closing tag all rolled into one.

The following is an example of the code behind a basic webpage:

<!DOCTYPE HTML PUBLIC " -//W3C//DTD HTML 4.01 Transition//EN" 
"http://www.w3.org/TR/htm14/loose.dtd">
<html>
  <head>
    <title>Nova's Home Page</title>
  </head>
  <body>
    <h1>Welcome to Nova's Home Page</h1>
    <p>Hi! I am Nova. This is my home page!
    I am interested in photography </p>
  </body>
</html>

In case you were curious, the <!DOCTYPE...> tag announce the kind of page this is.

When the browser displays the above code, it looks like this:

Basic Webpage

The corresponding tags are shown above in red font. Notice how the tags influence the size and style of the text.

4.2 Using Python to Generate HTML

You can generate your own home page with your own name and interest using Python:

#taken from program 109 on page 294 
#of Computing and Programming in Python, a Multimedia Approach

def makeHomePage(name, interest):
  file=open("/Users/yourusername/Desktop/PythonText/homepage.html", "wt")
  file.write("""<!DOCTYPE HTML PUBLIC " -//W3C//DTD HTML 4.01 Transition//EN" 
"http://www.w3.org/TR/htm14/loose.dtd">
<html>
  <head>
    <title>""" + name + """'s Home Page</title>
    <link rel="stylesheet" href="myCs325Style.css" type="text/css"/>
  </head>
  <body>
    <h1>Welcome to """+name+"""'s Home Page</h1>
    <p>Hi! I am """+name+""". This is my home page!
    I am interested in """ + interest + """ </p>
  </body>
</html>""")
  file.close()

Notice how there is a call to open the file for writing text ("wt") and only one call to file.write(). The contents of the basic webpage with variables inserted for the name and interest are between triple quotes. Voila, instant webpage!

4.3 Basic Cascading Style Sheets

Did anyone notice the extra line that was inserted into the "Python version" of the webpage? There was a deliberate addition of <link.../>. This tells the browser to look for a file called "myCs325Style.css" for information about the style of the webpage.

A sample style sheet is the following:

body {background-color: #DDDDDD}
h1 { color: #FF0000;
     font-family: Arial, Helvetica, sans-serif;
     border:thin dotted #AA0000;
   }
p  { color: #0000FF;
     font-family:"Times New Roman", Times, serif;
     font-size: 20px;
   }

Without getting into the details of Cascading Style Sheets. The idea is that anything between the body tags will have a gray (DDDDDD) background (the entire webpage in this case). Anything that is between h1 tags will have red font (FF0000), Arial, Helvetica, or sans-serif font, and will be enclosed in a thin, dotted border. What color font and style will be applied to paragraphs?

Notice the syntax of this style sheet. It contains curly brackets ({}), colons (:), and semi-colons(;).

With this style sheet applied, the webpage looks like this:
Webpage with Style Sheet

To emphasize once again, we have not made any changes to the content of the webpage; we have only added a style that goes with the tags.

As a side note, have a look and see if you can visually see the difference between the sans-serif (h1) and serif font (p).

4.4 Using Python to Generate CSS

As you might have guessed by now, you can use Python to generate that exact same style sheet from above.

def makeStyleSheet():
  file=open("/Users/yourusername/Desktop/PythonText/myCs325Style.css", "wt")
  file.write("""body {background-color: #DDDDDD}
h1 { color: #FF0000;
     font-family: Arial, Helvetica, sans-serif;
     border:thin dotted #AA0000;
   }
p  { color: #0000FF;
     font-family:"Times New Roman", Times, serif;
     font-size: 20px;
   }""")
  file.close()

Try it!


5. References


6. Exercise

General Idea

We are trying to emphasize the idea that style and content (the words on our page) are separate. Style sheets are a good way of showing this. When we change the style sheet, our webpage display can be drastically different. This exercise asks you to randomly generate a style sheet using Python so that the font is readable on a background color.

More Details