XML is the Extensible Markup Language. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable way.
It is extensible because it is not a fixed format like HTML (which is a single, predefined markup language). Instead, XML is actually a metalanguage—a language for describing other languages—which lets you design your own markup languages for limitless different types of documents. XML can do this because it’s written in SGML, the international standard metalanguage for text document markup (ISO 8879).
Sessions are the server side version of cookies. While a cookie persists data (or state) at the client, sessions do it at the server. Sessions have the advantage that the data do not travel the network thus making it both safer and faster although this not entirely true as shown in the next paragraph
The session state is kept in a file or in a database at the server side. Each session is identified by an id or session id (SID). To make it possible to the client to identify himself to the server the SID must be created by the server and sent to the client and then sent back to the server whenever the client makes a request. There is still data going through the net, the SID.
The server can send the SID to the client in a link’s query string or in a hidden form field or as a Set-Cookie header. The SID can be sent back from the client to the server as a query string parameter or in the body of the HTTP message if the post method is used or in a Cookie HTTP header.
If a cookie is not used to store the SID then the session will only last until the browser is closed, or the user goes to another site breaking the POST or query string transmission, or in other words, the session will last only until the user leaves the site.
* Cookie Based SID:
A cookie based session has the advantage that it lasts until the cookie expires and, as only the SID travels the net, it is faster and safer. The disadvantage is that the client must have cookies enabled.
The only particularity with the cookie used to set a session is its value:
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
The hash of the server time makes an unique SID for each session.
#!/usr/bin/env python
import sha, time, Cookie, os
cookie = Cookie.SimpleCookie()
string_cookie = os.environ.get(‘HTTP_COOKIE’)
# If new session
if not string_cookie:
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
# Set the sid in the cookie
cookie['sid'] = sid
# Will expire in a year
cookie['sid']['expires'] = 12 * 30 * 24 * 60 * 60
# If already existent session
else:
cookie.load(string_cookie)
sid = cookie['sid'].value
print cookie
print ‘Content-Type: text/html\n’
print ‘
if string_cookie:
print ‘
Already existent session
‘
else:
print ‘
New session
‘
print ‘
SID =’, sid, ‘
‘
print ‘‘
In every page the existence of the cookie must be tested. If it does not exist then redirect to a login page or just create it if a login or a previous state is not required.
* Query String SID;
Query string based session:
#!/usr/bin/env python
import sha, time, cgi, os
sid = cgi.FieldStorage().getfirst(‘sid’)
if sid: # If session exists
message = ‘Already existent session’
else: # New session
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
message = ‘New session’
qs = ‘sid=’ + sid
print “”"\
Content-Type: text/html\n
%s
SID = %s
“”" % (message, sid, sid)
To mantain a session you will have to append the query string to all the links in the page.
Save this file as set_sid_qs.py and run it two or more times. Try to close the browser and call the page again. The session is gone. The same happens if the page address is typed in the address bar.
* Hidden Field SID;
The hidden form field SID is almost the same as the query string based one, sharing the same problems.
#!/usr/bin/env python
import sha, time, cgi, os
sid = cgi.FieldStorage().getfirst(‘sid’)
if sid: # If session exists
message = ‘Already existent session’
else: # New session
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
message = ‘New session’
qs = ‘sid=’ + sid
print “”"\
Content-Type: text/html\n
%s
SID = %s
“”" % (message, sid, sid)
* The shelve module;
Having a SID is not enough. It is necessary to save the session state in a file or in a database. To save it into a file the shelve module is used. The shelve module opens a file and returns a dictionary like object which is readable and writable as a dictionary.
# The shelve module will persist the session data
# and expose it as a dictionary
session = shelve.open(‘/tmp/.session/sess_’ + sid, writeback=True)
The SID is part of file name making it a unique file. The apache user must have read and write permission on the file’s directory. 660 would be ok.
The values of the dictionary can be any Python object. The keys must be immutable objects.
# Save the current time in the session
session['lastvisit'] = repr(time.time())
# Retrieve last visit time from the session
lastvisit = session.get(‘lastvisit’)
The dictionary like object must be closed as any other file should be:
session.close()
* Cookie and Shelve;
A sample of how to make cookies and shelve work together keeping session state at the server side:
#!/usr/bin/env python
import sha, time, Cookie, os, shelve
cookie = Cookie.SimpleCookie()
string_cookie = os.environ.get(‘HTTP_COOKIE’)
if not string_cookie:
sid = sha.new(repr(time.time())).hexdigest()
cookie['sid'] = sid
message = ‘New session’
else:
cookie.load(string_cookie)
sid = cookie['sid'].value
cookie['sid']['expires'] = 12 * 30 * 24 * 60 * 60
# The shelve module will persist the session data
# and expose it as a dictionary
session = shelve.open(‘/tmp/.session/sess_’ + sid, writeback=True)
# Retrieve last visit time from the session
lastvisit = session.get(‘lastvisit’)
if lastvisit:
message = ‘Welcome back. Your last visit was at ‘ + \
time.asctime(time.gmtime(float(lastvisit)))
# Save the current time in the session
session['lastvisit'] = repr(time.time())
print “”"\
%s
Content-Type: text/html\n
%s
SID = %s
“”" % (cookie, message, sid)
session.close()
It first checks if there is a cookie already set. If not it creates a SID and attributes it to the cookie value. An expiration time of one year is established.
The lastvisit data is what is maintained in the session.
HTTP is said to be a stateless protocol. What this means for web programmers is that every time a user loads a page it is the first time for the server. The server can’t say whether this user has ever visited that site, if is he in the middle of a buying transaction, if he has already authenticated, etc.
A cookie is a tag that can be placed on the user’s computer. Whenever the user loads a page from a site the site’s script can send him a cookie. The cookie can contain anything the site needs to identify that user. Then within the next request the user does for a new page there goes back the cookie with all the pertinent information to be read by the script.
* Set the Cookie;
There are two basic cookie operations. The first is to set the cookie as an HTTP header to be sent to the client. The second is to read the cookie returned from the client also as an HTTP header.
This script will do the first one placing a cookie on the client’s browser:
#!/usr/bin/env python
import time
# This is the message that contains the cookie
# and will be sent in the HTTP header to the client
print ‘Set-Cookie: lastvisit=’ + str(time.time());
# To save one line of code
# we replaced the print command with a ‘\n’
print ‘Content-Type: text/html\n’
# End of HTTP header
print ‘
‘The Set-Cookie header contains the cookie. Save and run this code from your browser and take a look at the cookie saved there. Search for the cookie name, lastvisit, or for the domain name, or the server IP like 10.1.1.1 or 127.0.0.1.
The Cookie Object
The Cookie module can save us a lot of coding and errors and the next pages will use it in all cookie operations.
#!/usr/bin/env python
import time, Cookie
# Instantiate a SimpleCookie object
cookie = Cookie.SimpleCookie()
# The SimpleCookie instance is a mapping
cookie['lastvisit'] = str(time.time())
# Output the HTTP message containing the cookie
print cookie
print ‘Content-Type: text/html\n’
print ‘
‘It does not seem as much for this extremely simple code, but wait until it gets complex and the Cookie module will be your friend.
* Retrieve the Cookie;
The returned cookie will be available as a string in the os.environ dictionary with the key ‘HTTP_COOKIE’:
cookie_string = os.environ.get(‘HTTP_COOKIE’)
The load() method of the SimpleCookie object will parse that string rebuilding the object’s mapping:
cookie.load(cookie_string)
Complete code:
#!/usr/bin/env python
import Cookie, os, time
cookie = Cookie.SimpleCookie()
cookie['lastvisit'] = str(time.time())
print cookie
print ‘Content-Type: text/html\n’
print ‘
‘Server time is’, time.asctime(time.localtime()), ‘
‘
# The returned cookie is available in the os.environ dictionary
cookie_string = os.environ.get(‘HTTP_COOKIE’)
# The first time the page is run there will be no cookies
if not cookie_string:
print ‘
First visit or cookies disabled
‘
else: # Run the page twice to retrieve the cookie
print ‘
The returned cookie string was “‘ + cookie_string + ‘”
‘
# load() parses the cookie string
cookie.load(cookie_string)
# Use the value attribute of the cookie to get it
lastvisit = float(cookie['lastvisit'].value)
print ‘
Your last visit was at’,
print time.asctime(time.localtime(lastvisit)), ‘
‘
print ‘‘
When the client first loads the page there will be no cookie in the client’s computer to be returned. The second time the page is requested then the cookie saved in the last run will be sent to the server.
* Morsels
In the previous cookie retrieve program the lastvisit cookie value was retrieved through its value attribute:
lastvisit = float(cookie['lastvisit'].value)
When a new key is set for a SimpleCookie object a Morsel instance is created:
>>> import Cookie
>>> import time
>>>
>>> cookie = Cookie.SimpleCookie()
>>> cookie
>>>
>>> cookie['lastvisit'] = str(time.time())
>>> cookie['lastvisit']
>>>
>>> cookie['lastvisit'].value
’1159535133.33′
Each cookie, a Morsel instance, can only have a predefined set of keys: expires, path, commnent, domain, max-age, secure and version. Any other key will raise an exception.
#!/usr/bin/env python
import Cookie, time
cookie = Cookie.SimpleCookie()
# name/value pair
cookie['lastvisit'] = str(time.time())
# expires in x seconds after the cookie is output.
# the default is to expire when the browser is closed
cookie['lastvisit']['expires'] = 30 * 24 * 60 * 60
# path in which the cookie is valid.
# if set to ‘/’ it will valid in the whole domain.
# the default is the script’s path.
cookie['lastvisit']['path'] = ‘/cgi-bin’
# the purpose of the cookie to be inspected by the user
cookie['lastvisit']['comment'] = ‘holds the last user\’s visit date’
# domain in which the cookie is valid. always stars with a dot.
# to make it available in all subdomains
# specify only the domain like .my_site.com
cookie['lastvisit']['domain'] = ‘.www.my_site.com’
# discard in x seconds after the cookie is output
# not supported in most browsers
cookie['lastvisit']['max-age'] = 30 * 24 * 60 * 60
# secure has no value. If set directs the user agent to use
# only (unspecified) secure means to contact the origin
# server whenever it sends back this cookie
cookie['lastvisit']['secure'] = ”
# a decimal integer, identifies to which version of
# the state management specification the cookie conforms.
cookie['lastvisit']['version'] = 1
print ‘Content-Type: text/html\n’
print ‘
‘, cookie, ‘
‘
for morsel in cookie:
print ‘
‘, morsel, ‘=’, cookie[morsel].value
print ‘
‘
Notice that print cookie automatically formats the expire date.
The FieldStorage class of the cgi module has all that is needed to handle submited forms.
import cgi
form = cgi.FieldStorage() # instantiate only once!
It is transparent to the programmer if the data was submited by GET or by POST. The interface is exactly the same.
* Unique field names :
Suppose we have this HTML form which submits a field named name to a python CGI script named process_form.py:
This is the process_form.py script:
#!/usr/bin/env python
import cgi
form = cgi.FieldStorage() # instantiate only once!
name = form.getfirst(‘name’, ‘empty’)
# Avoid script injection escaping the user input
name = cgi.escape(name)
print “”"\
Content-Type: text/html\n
The submited name was “%s”
“”" % name
The getfirst() method returns the first value of the named field or a default or None if no field with that name was submited or if it is empty. If there is more than one field with the same name only the first will be returned.
If you change the HTML form method from get to post the process_form.py script will be the same.
* Multiple field names:
If there is more than one field with the same name like in HTML input check boxes then the method to be used is getlist(). It will return a list containing as many items (the values) as checked boxes. If no check box was checked the list will be empty.
Sample HTML with check boxes:
And the corresponding process_check.py script:
#!/usr/bin/env python
import cgi
form = cgi.FieldStorage()
# getlist() returns a list containing the
# values of the fields with the given name
colors = form.getlist(‘color’)
print “Content-Type: text/html\n”
print ‘
‘, cgi.escape(color), ‘
‘
print ‘‘
* File Upload;
To upload a file the HTML form must have the enctype attribute set to multipart/form-data. The input tag with the file type will create a “Browse” button.
The getfirst() and getlist() methods will only return the file(s) content. To also get the filename it is necessary to access a nested FieldStorage instance by its index in the top FieldStorage instance.
#!/usr/bin/env python
import cgi
form = cgi.FieldStorage()
# A nested FieldStorage instance holds the file
fileitem = form['file']
# Test if the file was uploaded
if fileitem.filename:
open(‘files/’ + fileitem.filename, ‘w’).write(fileitem.file.read())
message = ‘The file “‘ + fileitem.filename + ‘” was uploaded successfully’
else:
message = ‘No file was uploaded’
print “”"\
Content-Type: text/html\n
%s
“”" % (message,)
The Apache user must have write permission on the directory where the file will be saved.
* Big File Upload
To handle big files without using all the available memory a generator can be used. The generator will return the file in small chunks:
#!/usr/bin/env python
import cgi
form = cgi.FieldStorage()
# Generator to buffer file chunks
def fbuffer(f, chunk_size=10000):
while True:
chunk = f.read(chunk_size)
if not chunk: break
yield chunk
# A nested FieldStorage instance holds the file
fileitem = form['file']
# Test if the file was uploaded
if fileitem.filename:
f = open(‘files/’ + fileitem.filename, ‘w’)
# Read the file in chunks
for chunk in fbuffer(fileitem.file):
f.write(chunk)
f.close()
message = ‘The file “‘ + fileitem.filename + ‘” was uploaded successfully’
else:
message = ‘No file was uploaded’
print “”"\
Content-Type: text/html\n
%s
“”" % (message,)
Syntax and header errors are hard to catch unless you have access to the server logs. Syntax error messages can be seen if the script is run in a local shell before uploading to the server.
For a nice exceptions report there is the cgitb module. It will show a traceback inside a context. The default output is sent to standard output as HTML:
#!/usr/bin/env python
print “Content-Type: text/html”
print
import cgitb; cgitb.enable()
print 1/0
The handler() method can be used to handle only the catched exceptions:
#!/usr/bin/env python
print “Content-Type: text/html”
print
import cgitb
try:
f = open(‘non-existent-file.txt’, ‘r’)
except:
cgitb.handler()
There is also the option for a crude approach making the header “text/plain” and setting the standard error to standard out:
#!/usr/bin/env python
print “Content-Type: text/plain”
print
import sys
sys.stderr = sys.stdout
f = open(‘non-existent-file.txt’, ‘r’)
Will output this:
Traceback (most recent call last):
File “/var/www/html/teste/cgi-bin/text_error.py”, line 6, in ?
f = open(‘non-existent-file.txt’, ‘r’)
IOError: [Errno 2] No such file or directory: ‘non-existent-file.txt’
Warning: These techniques expose information that can be used by an attacker. Use it only while developing/debugging. Once in production disable it.
#!/usr/bin/env python
print “Content-Type: text/html”
print
print “”"\
To test your setup save it with the .py extension, upload it to your server as text and make it executable before trying to run it.
The first line of a python CGI script sets the path where the python interpreter will be found in the server. Ask your provider what is the correct one. If it is wrong the script will fail. Some examples:
#!/usr/bin/python
#!/usr/bin/python2.3
#!/usr/bin/python2.4
It is necessary that the script outputs the HTTP header. The HTTP header consists of one or more messages followed by a blank line. If the output of the script is to be interpreted as HTML then the content type will be text/html. The blank line signals the end of the header and is required.
print “Content-Type: text/html”
print
If you change the content type to text/plain the browser will not interpret the script’s output as HTML but as pure text and you will only see the HTML source. Try it now to never forget. A page refresh may be necessary for it to work.
Client versus Server
All python code will be executed at the server only. The client’s agent (for example the browser) will never see a single line of python. Instead it will only get the script’s output. This is something realy important to understand.
When programming for the Web you are in a client-server environment, that is, do not make things like trying to open a file in the client’s computer as if the script were running there. It isn’t.
Some host providers only let you run CGI scripts in a certain directory, often named cgi-bin. In this case all you have to do to run the script is to call it like this:
http://my_server.tld/cgi-bin/my_script.py
The script will have to be made executable by “others”. Give it a 755 permission or check the executable boxes if there is a graphical FTP interface.
Some hosts let you run CGI scripts in any directory. In some of these hosts you don’t have to do anything do configure the directories. In others you will have to add these lines to a file named .htaccess in the directory you want to run CGI scripts from:
Options +ExecCGI
AddHandler cgi-script .py
If the file does not exist create it. All directories below a directory with a .htaccess file will inherit the configurations. So if you want to be able to run CGI scripts from all directories create this file in the document root.
To run a script saved at the root:
http://my_server.tld/my_script.py
If it was saved in some directory:
http://my_server.tld/some_dir/some_subdir/my_script.py
Make sure all text files you upload to the server are uploaded as text (not binary), specially if you are in Windows, otherwise you will have problems.
The solution appears to be always append the “” on the end of shared drives.
>>> import os
>>>os.path.isdir( ‘\\\\rorschach\\public’)
0
>>>os.path.isdir( ‘\\\\rorschach\\public\\’)
1
It helps to think of share points as being like drive letters. Example:
k: is not a directory
k:\ is a directory
k:\media is a directory
k:\media\ is not a directory
The same rules apply if you substitute “k:” with “\conkyfoo”:
\\conky\foo is not a directory
\\conky\foo\ is a directory
\\conky\foo\media is a directory
\\conky\foo\media\ is not a directory
Use win32api:
def kill(pid):
“”"kill function for Win32″”"
import win32api
handle = win32api.OpenProcess(1, 0, pid)
return (0 != win32api.TerminateProcess(handle, 0))
On the Microsoft IIS server or on the Win95 MS Personal Web Server you set up Python in the same way that you would set up any other scripting engine.
Run regedt32 and go to:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W3SVC\Parameters\ScriptMap
and enter the following line (making any specific changes that your system may need):
.py :REG_SZ: c:\\python.exe -u %s %s
This line will allow you to call your script with a simple reference like: http://yourserver/scripts/yourscript.py provided “scripts” is an “executable” directory for your server (which it usually is by default). The “-u” flag specifies unbuffered and binary mode for stdin – needed when working with binary data.
In addition, it is recommended that using “.py” may not be a good idea for the file extensions when used in this context (you might want to reserve *.py for support modules and use *.cgi or *.cgp for “main program” scripts).
In order to set up Internet Information Services 5 to use Python for CGI processing, please see the following links:
http://www.e-coli.net/pyiis_server.html (for Win2k Server) http://www.e-coli.net/pyiis.html (for Win2k pro)
Configuring Apache is much simpler. In the Apache configuration file httpd.conf, add the following line at the end of the file:
ScriptInterpreterSource Registry
Then, give your Python CGI-scripts the extension .py and put them in the cgi-bin directory.