Click here to check if anything new just came in.
February 07 2010
vegitarian dinner
Posted via email from Notez | Comment »
February 05 2010
February 03 2010
Redbottle
A REST-style pluggable application that neatly integrates bottle.py with RedisFebruary 02 2010
February 01 2010
Webapps on App Engine, part 4: Templating
This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.
In the first three posts of this series, we covered all the components of a bare bones webapp framework: routing, request/response encoding, and request handlers. Most people expect a lot more out of their web framework, however. Individual frameworks take different approaches to this, from the minimalist webapp framework, which provides the bare minimum plus some integration with other tools, to the all-inclusive django, to the 'best of breed' Pylons, which focuses on including and integrating the best libraries for each task, rather than writing their own.
For our framework, we're going to take an approach somewhere between webapp's and Pylons': While keeping our framework minimal and modular, we'll look at the best options to use for other components - specifically, templating and session handling. In this post, we'll discuss templating.
To anyone new to webapps, templates may seem somewhat unnecessary. We can simply generate the output direct from our code, right? Many CGI scripting languages used this approach, and the results are often messy. Sometimes, which page to be generated isn't clear until after a significant amount of processing has been done, and dealing with errors and other exceptional conditions likewise becomes problematic. Finally, this approach tends to lead to gobs of print statements cluttering up the code, making both the structure of the page and the flow of the code unclear.
Templating systems were designed to eliminate these issues. Instead of generating the page as we process the request, we wait until we're done, construct a dictionary of variables to pass to the template, and select and render the template we need. The templating system then takes care of interpreting the template, substituting in the variables we passed where necessary. Templating doesn't have to be complicated - here's a simple templating system:
def render_template(template, values): return template % values # Example template = """Hello, %(name)s! How are you this %(time_of_day)s?""" self.response.body = render_template(template, values)
This 'templating system' simply uses Python's string formatting functionality to generate its result. It quickly becomes apparrent that this isn't sufficient for templating web pages, though - we need more functionality. At a minimum, we need some form of flow control, so we can include sections of a page conditionally, such as login/logout links, and some form of looping construct, so we can include repeated sections, such as results from datastore queries. It helps if our templating system provides some features for template reuse, too, such as including other templates, or extending them.
How powerful should our templates be, though? This is a source of some disagreement. Some templating systems, like Django's, take a very minimalist approach, and contain only the bare minimum of functionality required to render templates. Any form of calculation - even things as simple as basic math and comparisons - should be done in code, with the results passed to the template. Other templating systems, like Mako, provide a much more full-featured templating language, and trust you not to abuse it.
Here's what sample templates look like in a few templating languages:
Django:
{% extends "base_generic.html" %}
{% block title %}{{ section.title }}{% endblock %}
{% block content %}
<h1>{{ section.title }}</h1>
{% for story in story_list %}
<h2>
<a href="{{ story.get_absolute_url }}">
{{ story.headline|upper }}
</a>
</h2>
<p>{{ story.tease|truncatewords:"100" }}</p>
{% endfor %}
{% endblock %}
You're probably already familiar with Django's template syntax from previous posts. Because it's included with App Engine, it's often the easy default. As we've already mentioned, it's very restrictive about what you can do: It provides a few primitives, and relies on an extensible library of tags and filters (the bits after the | in {{...}}) to make it useful.
Mako:
<%inherit file="base.html"/>
<%
rows = [[v for v in range(0,10)] for row in range(0,10)]
%>
<table>
% for row in rows:
${makerow(row)}
% endfor
</table>
<%def name="makerow(row)">
<tr>
% for name in row:
<td>${name}</td>\
% endfor
</tr>
</%def>
Mako is another popular templating engine, and takes the opposite approach to Django: Templates are created by inserting actual Python code, inside special processing directives of the form <% .. %>. It even goes so far as to permit and encourage defining functions inside templates! Mako works around Python's use of indentation for control flow by defining new keywords such as 'endfor'.
Cheetah
<html>
<head><title>$title</title></head>
<body>
<table>
#for $client in $clients
<tr>
<td>$client.surname, $client.firstname</td>
<td><a href="mailto:$client.email">$client.email</a></td>
</tr>
#end for
</table>
</body>
</html>
Cheetah is another templating system that takes the "we provide the gun, you point it at your foot" approach. It's similar to Mako in many ways, but doesn't require variable substitutions to be embedded in curly braces, and instead of using processing directives, it treats lines starting with a # as Python code. It also uses special directives such as 'end for' for nesting.
Jinja2
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
{% block head %}
<link rel="stylesheet" href="style.css" />
<title>{% block title %}{% endblock %} - My Webpage</title>
{% endblock %}
</head>
<body>
<div id="content">{% block content %}{% endblock %}</div>
<div id="footer">
{% block footer %}
© Copyright 2008 by <a href="http://domain.invalid/">you</a>.
{% endblock %}
</div>
</body>
Jinja2 claims to have "Django-like syntax (but faster)". Notably, the syntactic elements for loops, control flow, etc, are a lot more Python-like, but they're still limited to what the language supports, and it still relies on specially defined tests, filters, etc.
Tenjin
<html>
<body>
<h1>${title}</h1>
<table>
<?py i = 0 ?>
<?py for item in items: ?>
<?py i += 1 ?>
<?py color = i % 2 == 0 and '#FFCCCC' or '#CCCCFF' ?>
<tr bgcolor="#{color}">
<td>#{i}</td>
<td>${item}</td>
</tr>
<?py #endfor ?>
</table>
</body>
</html>
Tenjin claims to be the fastest Python templating framework - and it has benchmarks (although not exhaustive ones) to back it up. It takes the general approach, with Python expressions and statements embedded directly into the markup. Notably, indentation of the Python statements maters, producing a somewhat confused mix of markup and Python code.
Chameleon
<table border="1">
<tr tal:repeat="row range(10)">
<td tal:repeat="column range(10)">
<span tal:define="x repeat.row.number;
y repeat.column.number;
z x * y"
tal:replace="string:$x * $y = $z">1 * 1 = 1</span>
</td>
</tr>
</table>
Chameleon is based on the Zope Page Templates specification, which takes an interesting approach. Chameleon templates are valid XML documents, unlike many templating engines, and it uses namespaces for attributes and tags to define the template behaviour. For example, the "tal:repeat" tag indicates that the tag it's on and all its children should be repeated for each value in the Python iterator passed as an argument. tal:replace replaces an element with the value of the expression, while other expressions permit setting attributes and replacing body content, and a 'meta' namespace handles operations such as including other templates.
Chameleon extends the Zope standard in several ways: expressions can be arbitrary Python expressions, and values can be substituted using a ${...} syntax in addition to the XML syntax, which avoids a lot of boilerplate in some situations. Chameleon templates are compiled to Python code on first use - so you're not doing XML DOM manipulation on every template generation. Chameleon recently got App Engine support when the author refactored out some code that relied on modules not available in App Engine.
I haven't mentioned all Python's templating systems here, by a long shot - this is merely a representative sample. For a more complete list, see this page.
Our framework
Examining the different templating engines leads to an interesting observation: Those templating engines that take the Django approach of "only what's necessary" tend to, of necessity, be a lot larger and more involved - and hence have a steeper learning curve - than those that allow you to leverage your Python knowledge in some fashion. For that reason and others, I'm not a big fan of them - I'd rather provide someone with a powerful but lightweight system, and trust them to shoot themselves in the foot, than use a more complicated system designed to make foot-shooting an impossibility. With that consideration and others in mind, we'll look at what is required to use Chameleon in our framework.
Using Chameleon
Installation is straightforward: Download the tarball from the PyPi page, and copy the Chameleon-1.1.1/src/chameleon directory into a directory on the system path (eg, your app's root folder).
To use Chameleon, we first define a template loader:
from chameleon.zpt import loader
template_path = os.path.join(os.path.dirname(__file__), "..", "templates)
template_loader = loader.TemplateLoader(template_path, auto_reload=os.environ['SERVER_SOFTWARE'].startswith('Dev'))
Template loaders serve to cache loaded and compiled templates, which is essential for performance. As such, it makes sense to define our loader once at module level, and use it for all requests. To render a template, we fetch a template from the loader, then call it with keyword arguments corresponding to the parameters we want to pass to the template:
def TestHandler(RequestHandler):
def get(self):
template = template_loader.load("test.pt")
self.response.body = template(name="test")
In terms of integrating templates into our framework, there's not a great deal we can do without locking users of our framework into using our preferred templating system. You can bundle the templating system with the framework, and even make it create a loader when it's first used. The approach you take depends on where you want to be on the flexibility / ease-of-use axis.
In the next post, we'll discuss session handling, the options available on App Engine, and how to integrate it into our framework.
January 29 2010
January 28 2010
January 27 2010
January 25 2010
Announcing memprof: a ruby level memory profiler
Hi - I've been working on a Ruby level memory profiler for Ruby 1.8.6 and 1.8.7 for a few weeks. This profiler is just a ruby gem (available on gemcutter) and can be installed and used with NO PATCHES to the Ruby VM. It provides functionality similar to bleakhouse. The gem ONLY SUPPORTS x86_64 LINUX builds of MRI 1.8.6/1.8.7 and REE. You can read about how to use the profiler on my blog: http://timetobleed.com/memprof-a-ruby-level-memory-profiler/ If you are interested in learning how this works without patching the ruby VM, have a look at my blog posts that explain the scary hacks needed: http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/ http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/ http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/ Questions, comments, and patches welcome! joe
Counting concurrent sessions
Answering questions asked on the site.
Steve asks:
I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I’m trying to write daily performance reporting for the site).
Here’s a simplified view of my base table design:
CREATE TABLE CONNECTIONS ( USERID VARCHAR2(30), HANDLE VARCHAR2(40), PROCESS_DATE DATE, COMMAND NUMBER(6,0), COUNT NUMBER(10,0), ELAPSED_TIME NUMBER(10,0), NUM_OF_RECS NUMBER(10,0), COMMAND_TIMESTAMP TIMESTAMP (6) )
The question is: at there moment each transaction started, how many other transactions were active?
At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between command_timestamp and command_timestamp + elapsed_time / 86400000.
Database B-Tree indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.
But these is a more simple solution.
Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.
So we just can build a table of events
: starts and ends of the transactions, ordered chronologically. Each start
would be denoted with a +1, and each end
with a -1. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.
This can be easily done merely by calculating the partial sum of these +1’s and -1’s, which is an easy task for Oracle’s analytic functions.
Let’s create a sample table. I’ll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:
Table creation details
BEGIN
DBMS_RANDOM.seed(20100125);
END;
/
CREATE TABLE t_session
(
id NOT NULL PRIMARY KEY,
command_timestamp NOT NULL,
elapsed_time NOT NULL,
stuffing NOT NULL
)
AS
SELECT level,
CAST(TO_DATE('25.01.2010', 'dd.mm.yyyy') - level / 1440 - DBMS_RANDOM.value / 2880 AS TIMESTAMP(6)),
CAST(DBMS_RANDOM.value / 360 AS NUMBER(7, 5)),
CAST(RPAD('*', 200, '*') AS VARCHAR2(200))
FROM dual
CONNECT BY
level <= 1000000
/
CREATE INDEX ix_session_commandtimestamp ON t_session (command_timestamp)
/
CREATE INDEX ix_session_end ON t_session (command_timestamp + elapsed_time)
/
For the sake of brevity, elapsed_time is expressed in days, not in milliseconds.
The table is indexed with two indexes: one on command_timestamp and another one on command_timestamp + elapsed_time.
We will calculate the average of the concurrent queries for the transactions started on Jan 1st, 2010.
To do this, we will first need to select all transactions that overlap this date. This includes all transactions whose command_timestamp or command_timestamp + elapsed_time is within this date. Both these conditions are sargable, but not at the same time. An OR clause here would be inefficient, since no single index can be used to filter on both conditions.
To work around this, we will just split the query in two parts. The first part will select all transactions that started on this date, the second part will select all transactions that ended on this date but started earlier. These two sets do not intersect, and their sum gives all transaction we are interested at. So we can just merge these two sets using UNION ALL:
WITH current_sessions AS
(
SELECT *
FROM t_session
WHERE command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
UNION ALL
SELECT *
FROM t_session
WHERE command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
)
SELECT COUNT(*), SUM(LENGTH(stuffing))
FROM current_sessions
| COUNT(*) | SUM(LENGTH(STUFFING)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1444 | 288800 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (0.0024s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
SORT AGGREGATE
VIEW
UNION-ALL PARTITION
TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
Each part of the query used its own index and no extra effort is needed to get rid of the duplicates, so the query completes in 2 ms.
Now, we should need to calculate the number of concurrent transactions.
To do this, we will duplicate the recordset we got on the previous step, adding an extra field, event.
The first copy, with event = 1 will hold the beginnings of the transactions; the second copy with event = -1 will hold the ends. This will give us the equal number of the records with the opposite signs, so ultimately they will add up to a zero (since all transactions get into the table only after they complete). Each transaction will therefore be split into two records.
These records will then be sorted by event_date (which corresponds to command_timestamp or command_timestamp + elapsed_time respectively, depending on the set the record belongs to).
Then, the partial sum will be calculated for each record, using Oracle’s SUM() OVER () analytic function.
Since the events are ordered by their date, the value of the partial sum will hold the number of transactions open (since their opening record had already been selected and added a +1 to the sum), but not yet closed (since their closing record had not yet been selected). We don’t know which transactions exactly contributed to the sum, but we are not interested in this information. All we know (and all we need to know) is the difference between the numbers of open and closed transactions.
On this step we have the partial sum for each record, but we need to get rid of the extra records. We are interested in the number of concurrent transactions at the moments each transaction began, so we will just filter the resultset so that it only selects the records with event = 1, since they correspond to the beginning of the transactions.
Finally, we just need to subtract 1 from the partial sums. This is because the transaction records contribute to the partial sums too, and we need to count the number of concurrent transactions, not the total transactions.
And, finally, here’s the query:
WITH current_sessions AS
(
SELECT *
FROM t_session
WHERE command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
UNION ALL
SELECT *
FROM t_session
WHERE command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
)
SELECT AVG(concurrent), SUM(LENGTH(stuffing))
FROM (
SELECT q.*,
SUM(event) OVER (ORDER BY event_date, event DESC) - 1 AS concurrent
FROM (
SELECT 1 AS event,
command_timestamp AS event_date,
cb.*
FROM current_sessions cb
UNION ALL
SELECT -1 AS event,
command_timestamp + elapsed_time AS event_date,
ce.*
FROM current_sessions ce
) q
)
WHERE event = 1
| AVG(CONCURRENT) | SUM(LENGTH(STUFFING)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1,51662049861495844875346260387811634349 | 288800 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (0.0691s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
TEMP TABLE TRANSFORMATION
LOAD AS SELECT
UNION-ALL
TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
SORT AGGREGATE
VIEW
WINDOW SORT
VIEW
UNION-ALL
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
The query completes in only 69 ms.
Hope that helps.
I’m always glad to answer the questions regarding database queries.
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
