Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 07 2010

zh

vegitarian dinner

February 05 2010

zh
zh

February 03 2010

zh
zh

Redbottle

A REST-style pluggable application that neatly integrates bottle.py with Redis
zh

February 02 2010

zh
zh

February 01 2010

zh

Webapps on App Engine, part 4: Templating

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

In the first three posts of this series, we covered all the components of a bare bones webapp framework: routing, request/response encoding, and request handlers. Most people expect a lot more out of their web framework, however. Individual frameworks take different approaches to this, from the minimalist webapp framework, which provides the bare minimum plus some integration with other tools, to the all-inclusive django, to the 'best of breed' Pylons, which focuses on including and integrating the best libraries for each task, rather than writing their own.

For our framework, we're going to take an approach somewhere between webapp's and Pylons': While keeping our framework minimal and modular, we'll look at the best options to use for other components - specifically, templating and session handling. In this post, we'll discuss templating.

To anyone new to webapps, templates may seem somewhat unnecessary. We can simply generate the output direct from our code, right? Many CGI scripting languages used this approach, and the results are often messy. Sometimes, which page to be generated isn't clear until after a significant amount of processing has been done, and dealing with errors and other exceptional conditions likewise becomes problematic. Finally, this approach tends to lead to gobs of print statements cluttering up the code, making both the structure of the page and the flow of the code unclear.

Templating systems were designed to eliminate these issues. Instead of generating the page as we process the request, we wait until we're done, construct a dictionary of variables to pass to the template, and select and render the template we need. The templating system then takes care of interpreting the template, substituting in the variables we passed where necessary. Templating doesn't have to be complicated - here's a simple templating system:

def render_template(template, values):
  return template % values

# Example
template = """Hello, %(name)s! How are you this %(time_of_day)s?"""
self.response.body = render_template(template, values)

This 'templating system' simply uses Python's string formatting functionality to generate its result. It quickly becomes apparrent that this isn't sufficient for templating web pages, though - we need more functionality. At a minimum, we need some form of flow control, so we can include sections of a page conditionally, such as login/logout links, and some form of looping construct, so we can include repeated sections, such as results from datastore queries. It helps if our templating system provides some features for template reuse, too, such as including other templates, or extending them.

How powerful should our templates be, though? This is a source of some disagreement. Some templating systems, like Django's, take a very minimalist approach, and contain only the bare minimum of functionality required to render templates. Any form of calculation - even things as simple as basic math and comparisons - should be done in code, with the results passed to the template. Other templating systems, like Mako, provide a much more full-featured templating language, and trust you not to abuse it.

Here's what sample templates look like in a few templating languages:

Django:

{% extends "base_generic.html" %}

{% block title %}{{ section.title }}{% endblock %}

{% block content %}
<h1>{{ section.title }}</h1>

{% for story in story_list %}
<h2>
  <a href="{{ story.get_absolute_url }}">
    {{ story.headline|upper }}
  </a>
</h2>
<p>{{ story.tease|truncatewords:"100" }}</p>
{% endfor %}
{% endblock %}

You're probably already familiar with Django's template syntax from previous posts. Because it's included with App Engine, it's often the easy default. As we've already mentioned, it's very restrictive about what you can do: It provides a few primitives, and relies on an extensible library of tags and filters (the bits after the | in {{...}}) to make it useful.

Mako:

<%inherit file="base.html"/>
<%
    rows = [[v for v in range(0,10)] for row in range(0,10)]
%>
<table>
    % for row in rows:
        ${makerow(row)}
    % endfor
</table>
   
<%def name="makerow(row)">
    <tr>
    % for name in row:
        <td>${name}</td>\
    % endfor
    </tr>
</%def>

Mako is another popular templating engine, and takes the opposite approach to Django: Templates are created by inserting actual Python code, inside special processing directives of the form <% .. %>. It even goes so far as to permit and encourage defining functions inside templates! Mako works around Python's use of indentation for control flow by defining new keywords such as 'endfor'.

Cheetah

<html>
  <head><title>$title</title></head>
  <body>
    <table>
      #for $client in $clients
      <tr>
        <td>$client.surname, $client.firstname</td>
        <td><a href="mailto:$client.email">$client.email</a></td>
      </tr>
      #end for
    </table>
  </body>
</html>

Cheetah is another templating system that takes the "we provide the gun, you point it at your foot" approach. It's similar to Mako in many ways, but doesn't require variable substitutions to be embedded in curly braces, and instead of using processing directives, it treats lines starting with a # as Python code. It also uses special directives such as 'end for' for nesting.

Jinja2

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    {% block head %}
    <link rel="stylesheet" href="style.css" />
    <title>{% block title %}{% endblock %} - My Webpage</title>
    {% endblock %}
</head>
<body>
    <div id="content">{% block content %}{% endblock %}</div>
    <div id="footer">
        {% block footer %}
        © Copyright 2008 by <a href="http://domain.invalid/">you</a>.
        {% endblock %}
    </div>
</body>

Jinja2 claims to have "Django-like syntax (but faster)". Notably, the syntactic elements for loops, control flow, etc, are a lot more Python-like, but they're still limited to what the language supports, and it still relies on specially defined tests, filters, etc.

Tenjin

<html>
  <body>
    <h1>${title}</h1>
    <table>
<?py i = 0 ?>
<?py for item in items: ?>
<?py     i += 1 ?>
<?py     color = i % 2 == 0 and '#FFCCCC' or '#CCCCFF' ?>
      <tr bgcolor="#{color}">
        <td>#{i}</td>
        <td>${item}</td>
      </tr>
<?py #endfor ?>
    </table>
  </body>
</html>

Tenjin claims to be the fastest Python templating framework - and it has benchmarks (although not exhaustive ones) to back it up. It takes the general approach, with Python expressions and statements embedded directly into the markup. Notably, indentation of the Python statements maters, producing a somewhat confused mix of markup and Python code.

Chameleon

<table border="1">
  <tr tal:repeat="row range(10)">
    <td tal:repeat="column range(10)">
      <span tal:define="x repeat.row.number;
                        y repeat.column.number;
                        z x * y"
            tal:replace="string:$x * $y = $z">1 * 1 = 1</span>
    </td>
  </tr>
</table>

Chameleon is based on the Zope Page Templates specification, which takes an interesting approach. Chameleon templates are valid XML documents, unlike many templating engines, and it uses namespaces for attributes and tags to define the template behaviour. For example, the "tal:repeat" tag indicates that the tag it's on and all its children should be repeated for each value in the Python iterator passed as an argument. tal:replace replaces an element with the value of the expression, while other expressions permit setting attributes and replacing body content, and a 'meta' namespace handles operations such as including other templates.

Chameleon extends the Zope standard in several ways: expressions can be arbitrary Python expressions, and values can be substituted using a ${...} syntax in addition to the XML syntax, which avoids a lot of boilerplate in some situations. Chameleon templates are compiled to Python code on first use - so you're not doing XML DOM manipulation on every template generation. Chameleon recently got App Engine support when the author refactored out some code that relied on modules not available in App Engine.

I haven't mentioned all Python's templating systems here, by a long shot - this is merely a representative sample. For a more complete list, see this page.

Our framework

Examining the different templating engines leads to an interesting observation: Those templating engines that take the Django approach of "only what's necessary" tend to, of necessity, be a lot larger and more involved - and hence have a steeper learning curve - than those that allow you to leverage your Python knowledge in some fashion. For that reason and others, I'm not a big fan of them - I'd rather provide someone with a powerful but lightweight system, and trust them to shoot themselves in the foot, than use a more complicated system designed to make foot-shooting an impossibility. With that consideration and others in mind, we'll look at what is required to use Chameleon in our framework.

Using Chameleon

Installation is straightforward: Download the tarball from the PyPi page, and copy the Chameleon-1.1.1/src/chameleon directory into a directory on the system path (eg, your app's root folder).

To use Chameleon, we first define a template loader:

from chameleon.zpt import loader

template_path = os.path.join(os.path.dirname(__file__), "..", "templates)
template_loader = loader.TemplateLoader(template_path, auto_reload=os.environ['SERVER_SOFTWARE'].startswith('Dev'))

Template loaders serve to cache loaded and compiled templates, which is essential for performance. As such, it makes sense to define our loader once at module level, and use it for all requests. To render a template, we fetch a template from the loader, then call it with keyword arguments corresponding to the parameters we want to pass to the template:

def TestHandler(RequestHandler):
  def get(self):
    template = template_loader.load("test.pt")
    self.response.body = template(name="test")

In terms of integrating templates into our framework, there's not a great deal we can do without locking users of our framework into using our preferred templating system. You can bundle the templating system with the framework, and even make it create a loader when it's first used. The approach you take depends on where you want to be on the flexibility / ease-of-use axis.

In the next post, we'll discuss session handling, the options available on App Engine, and how to integrate it into our framework.

zh

January 29 2010

zh

January 28 2010

zh
zh
zh
zh
zh

January 27 2010

zh
zh

January 25 2010

zh

Announcing memprof: a ruby level memory profiler

Hi -

I've been working on a Ruby level memory profiler for Ruby 1.8.6 and
1.8.7 for a few weeks. This profiler is just a ruby gem (available on
gemcutter) and can be installed and used with NO PATCHES to the Ruby
VM. It provides functionality similar to bleakhouse.

The gem ONLY SUPPORTS x86_64 LINUX builds of MRI 1.8.6/1.8.7 and REE.

You can read about how to use the profiler on my blog:
http://timetobleed.com/memprof-a-ruby-level-memory-profiler/

If you are interested in learning how this works without patching the
ruby VM, have a look at my blog posts that explain the scary hacks
needed:
http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/
http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/
http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/

Questions, comments, and patches welcome!

joe


zh

Counting concurrent sessions

Answering questions asked on the site.

Steve asks:

I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I’m trying to write daily performance reporting for the site).

Here’s a simplified view of my base table design:

CREATE TABLE CONNECTIONS
(
USERID VARCHAR2(30),
HANDLE VARCHAR2(40),
PROCESS_DATE DATE,
COMMAND NUMBER(6,0),
COUNT NUMBER(10,0),
ELAPSED_TIME NUMBER(10,0),
NUM_OF_RECS NUMBER(10,0),
COMMAND_TIMESTAMP TIMESTAMP (6)
)

The question is: at there moment each transaction started, how many other transactions were active?

At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between command_timestamp and command_timestamp + elapsed_time / 86400000.

Database B-Tree indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.

But these is a more simple solution.

Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.

So we just can build a table of events: starts and ends of the transactions, ordered chronologically. Each start would be denoted with a +1, and each end with a -1. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.

This can be easily done merely by calculating the partial sum of these +1’s and -1’s, which is an easy task for Oracle’s analytic functions.

Let’s create a sample table. I’ll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:

Table creation details

BEGIN
        DBMS_RANDOM.seed(20100125);
END;
/

CREATE TABLE t_session
        (
        id NOT NULL PRIMARY KEY,
        command_timestamp NOT NULL,
        elapsed_time NOT NULL,
        stuffing NOT NULL
        )
AS
SELECT  level,
        CAST(TO_DATE('25.01.2010', 'dd.mm.yyyy') - level / 1440 - DBMS_RANDOM.value / 2880 AS TIMESTAMP(6)),
        CAST(DBMS_RANDOM.value / 360 AS NUMBER(7, 5)),
        CAST(RPAD('*', 200, '*') AS VARCHAR2(200))
FROM    dual
CONNECT BY
        level <= 1000000
/

CREATE INDEX ix_session_commandtimestamp ON t_session (command_timestamp)
/

CREATE INDEX ix_session_end ON t_session (command_timestamp + elapsed_time)
/

For the sake of brevity, elapsed_time is expressed in days, not in milliseconds.

The table is indexed with two indexes: one on command_timestamp and another one on command_timestamp + elapsed_time.

We will calculate the average of the concurrent queries for the transactions started on Jan 1st, 2010.

To do this, we will first need to select all transactions that overlap this date. This includes all transactions whose command_timestamp or command_timestamp + elapsed_time is within this date. Both these conditions are sargable, but not at the same time. An OR clause here would be inefficient, since no single index can be used to filter on both conditions.

To work around this, we will just split the query in two parts. The first part will select all transactions that started on this date, the second part will select all transactions that ended on this date but started earlier. These two sets do not intersect, and their sum gives all transaction we are interested at. So we can just merge these two sets using UNION ALL:

WITH    current_sessions AS
        (
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
                AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
        )
SELECT  COUNT(*), SUM(LENGTH(stuffing))
FROM    current_sessions
COUNT(*) SUM(LENGTH(STUFFING))
1444 288800
1 row fetched in 0.0001s (0.0024s)
SELECT STATEMENT
 SORT AGGREGATE
  VIEW
   UNION-ALL PARTITION
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END

Each part of the query used its own index and no extra effort is needed to get rid of the duplicates, so the query completes in 2 ms.

Now, we should need to calculate the number of concurrent transactions.

To do this, we will duplicate the recordset we got on the previous step, adding an extra field, event.

The first copy, with event = 1 will hold the beginnings of the transactions; the second copy with event = -1 will hold the ends. This will give us the equal number of the records with the opposite signs, so ultimately they will add up to a zero (since all transactions get into the table only after they complete). Each transaction will therefore be split into two records.

These records will then be sorted by event_date (which corresponds to command_timestamp or command_timestamp + elapsed_time respectively, depending on the set the record belongs to).

Then, the partial sum will be calculated for each record, using Oracle’s SUM() OVER () analytic function.

Since the events are ordered by their date, the value of the partial sum will hold the number of transactions open (since their opening record had already been selected and added a +1 to the sum), but not yet closed (since their closing record had not yet been selected). We don’t know which transactions exactly contributed to the sum, but we are not interested in this information. All we know (and all we need to know) is the difference between the numbers of open and closed transactions.

On this step we have the partial sum for each record, but we need to get rid of the extra records. We are interested in the number of concurrent transactions at the moments each transaction began, so we will just filter the resultset so that it only selects the records with event = 1, since they correspond to the beginning of the transactions.

Finally, we just need to subtract 1 from the partial sums. This is because the transaction records contribute to the partial sums too, and we need to count the number of concurrent transactions, not the total transactions.

And, finally, here’s the query:

WITH    current_sessions AS
        (
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
                AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
        )
SELECT  AVG(concurrent), SUM(LENGTH(stuffing))
FROM    (
        SELECT  q.*,
                SUM(event) OVER (ORDER BY event_date, event DESC) - 1 AS concurrent
        FROM    (
                SELECT  1 AS event,
                        command_timestamp AS event_date,
                        cb.*
                FROM    current_sessions cb
                UNION ALL
                SELECT  -1 AS event,
                        command_timestamp + elapsed_time AS event_date,
                        ce.*
                FROM    current_sessions ce
                ) q
        )
WHERE   event = 1
AVG(CONCURRENT) SUM(LENGTH(STUFFING))
1,51662049861495844875346260387811634349 288800
1 row fetched in 0.0001s (0.0691s)
SELECT STATEMENT
 TEMP TABLE TRANSFORMATION
  LOAD AS SELECT
   UNION-ALL
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
  SORT AGGREGATE
   VIEW
    WINDOW SORT
     VIEW
      UNION-ALL
       VIEW
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
       VIEW
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8

The query completes in only 69 ms.

Hope that helps.


I’m always glad to answer the questions regarding database queries.

Ask me a question

Tags: Oracle
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.