![]() |
|
Published 2006-07-29 Printer-friendly version
Advertisement
|
The Clarion Reference Library Clarion Databases & SQL
|
Mark Newsome, an HP developer, presented his take on improving query interfaces for web applications. His interface starts with a search form; the user chooses zero to n filter options, and then clicks a button. Rather than immediately displaying the search results, which could be in the thousands of records, he displays the number of records found, allowing the user to further refine the query if desired.
Keys to the search interface's ease of use are:
Mark also commented on the user of cursors for large data sets. The problem here is that you have to keep the cursor open, but since this is a web app you have no idea how long the user will be in the session. You might be using that resource unnecessarily. Instead, you can keep a list of just the primary keys (and maybe a sort field) for the returned items, in the sort order, and when needed for another page retrieve the rest of the data by primary key.
Rasmus Lerdorf, the inventor of PHP (and now one of about a thousand PHP committers), described Web 2.0 as putting a prettier face on the web, with fewer visible round trips to and from the server (there may actually be more trips but the user doesn't see most of these). You do need a rich presentation layer language, a server-side language (PHP in Rasmus' example, but it could be any of a number of other languages), a wire data format (XML and or JSON), a server side data store, and a semblance of security (there is no such thing as a totally secure web app).
Rasmus presented a scenario where a web site has half a million users, at an average of 100 requests per use. This works out to 578 requests per second, and you should use a factor of three to determine peak load (1734 requests per second).
You can load test with http_load from acme.com. If your CPU idle time is zero and you're not at your target, you can work on speeding up your app. If your CPU is not bound, then you are IO bound.
20 ms is a good target for average latency - 250 ms is pretty bad. The initial version of the app could only server 17 requests per second, meaning the application would need 100 servers, a substantial expense for a company in startup mode.
Rasmus commented that "if you wrote it, you're the last person who should be fixing it" since you would have written it better if you knew how. You can analyze your PHP using the Callgrind profiler, which sits on top of Valgrind. Run your server under Valgrind/Callgrind for testing only as profiling slows the server tremendously.
Rasmus demonstrated how non-persistent, secure connections to PostgreSQLcan significantly hurt performance, and noted that MySQL is dramatically fast at connecting; non-persistent MySQL connections are faster than persistent connections in PostgreSQL. Query caching boosts performance a little more.
After pooling/caching database connections, the next step is to add an opcode cache like APC, so the server doesn't keep reloading your PHP code.
Eventually Rasmus got the app trimmed to a very respectable 1100 requests per second, for a two-server system instead of a 100 server farm.
Rasmus also demonstrated some useful features in PHP 5, including including, reluctantly, SOAP capability. He's no fan of SOAP, demonstrating the massive WSDL file needed for a service that simply adds two numbers. He is a fan of RSS. Other nice bits in the presentation included geocoding and map tiling APIs from Yahoo. He concluded with a 30 second AJAX demonstration using Yahoo controls, and pointed to the Yahoo User Interface Library page.
You can download the slide show online. The slides were available online at the start of the session, and with wireless working it was nice to be able to view my own copy during the presentation.
TimeTravel tables allow you to get snapshots of your data at a particular point in time. Although A. Elein demonstrated this technique for PostgreSQL, you can apply the concept to just about any SQL (or, with extra code, a flat file) database.
The basic cost to your data structure is two timestamp columns in every row; queries require a qualification on one or both columns, and there is a small overhead for update and delete triggers.
In the old days PostreSQL did have time travel, since it had no-overwrite storage and you could simply decide not to vacuum the database. This technique creates the same kind of storage. You only insert data - deletes don't remove data, and updates insert a new changed row. You can be selective about which tables use this technique; in PostgreSQL you can template the necessary code.
First, define a table with a primary key, with start and end time columns (timestamps). The combination of primary key and end time must be unique. Current data will always have a null end time.
Create an inventory_at_time(timestamp) function for each table
(in PostgreSQL, one copy of the function for each table results in faster, inline
SQL) which returns a set of records which are valid for the given timestamp.
You can further use this set in a FROM clause.
The Delete rule simply sets the end time to the current time stamp where the key matches and the end time is null.
Updates are more complicated. If someone else is updating the row, the update needs to be disallowed. If a non-null end time is present on the old record then do nothing. If the new end time is null, then update the old record end time with the current timestamp, and insert the new record.
You may or may not want to be aggressive with the insert trigger, enforcing null end time and current start time.
Using this technique, you can retrieve a snapshot of your data at any time.
A. Elein Mustain is the author of an irregularly weekly column called PostgreSQL General Bits.
David Harms is an independent software developer and the editor and publisher of Clarion Magazine. He is also co-author with Ross Santos of Developing Clarion for Windows Applications, published by SAMS (1995), and has written or co-written several Java books. David is a member of the American Society of Journalists and Authors (ASJA).
Copyright © 1999-2008 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $159
(reg $189, save $30)
(includes all back issues since '99)
Renewals from $109
Two years: $249
(reg $289, save $40)
Renewals from $199