Wednesday, March 11, 2009

Web Workflows - An Automated Approach to Web Browsing

The web needs an easy to read language for defining web tasks.

I propose XML based Web Workflows:
<workflow>
<open>http://www.google.com</open>
<input name="q">Straw Berry<input>
<click name="btnG"></click>
</workflow>

Execution of this web workflow would result in a document with the search results attached.

I have a working open source implementation of this in PHP via Chisimba.

Description:
This module interprets XML based Web Workflows to programtaically browse
the web to return a particular URI endpoint. The language includes syntax like

<workflow>
<open>http://www.google.com</open>
<input name="q">Straw Berry<input>
<click name="btnG"></click>
</workflow>


The latter represents an easy to understand, high level, procedural language for
automated web document retrieval.

This module will also allow you to specify login credentials to automate loging into sites in order to access protected resources. This module will be used by the librarysearch module to assist with document retrieval on clusters of hosts.

Benefits:
This implementation is completely portable and because it accepts the web workflow as input to produce the document could run as a standalone app like curl or maybe even as a curl extension.

This would contribute hugely to federated search efforts where an API isn't available for a certain host.

History:
Screen scrapping tasks have mostly been executed as a very unordered and messy combination of curl/lynx -d/wget requests. This process needs to be formalized by the web community like w3c

References:
The java guys already have this available to them see here but it's too proprietary in the sense that you can only get this up and running by creating the engine from within the java code.

Concerns:
Webstats watchers and web ad agencies aren't going to like the adoption of a formal method for robots to surf their sites to effectively carry content to users. Perhaps this will create an opportunity for other revenue models to surface.

Comments anyone?

Wednesday, March 4, 2009

PHPUnit Test Case Builder for Chisimba

I have started work on a very nifty module for Chisimba that will allow any Chisimba module developer to create a complete PHPUnit test case for their module.

Introduction
The Chisimba PHPUnit Test Case Builder generates PHP test case classes for any Chisimba module.

Module Outputs
It implements a code pattern analyzer that classifies code into logical groups like Data Management, Utility, Callback and Display methods.

These code groupings are then transferred to a single PHPUnit skeleton class divided into the logical associations. All the standard, known asserts are added to the code making it ready for phpunit to process on the fly.

It creates a file with the following checks:
  • Uses the register.conf file to check against the current environment for dependencies.
  • Uses the live database structures to construct wrapper code for individual table field checks.
  • The Data Management Tests are broken down logically into Add/Edit/Delete sections and uses the table field checks to verify expected data (Uses table field data type to produce test data)
Benefits
This provides a solid base for the developer to start unit testing from as the last step would be to peruse the skeleton and logically select the proper assertions to use. The benefits of having logically inclined test cases include the ability to make core changes and see the rippling effects, identify possible problematic areas (e.g. fields as arguments in insert/edit that do not match up to database field names), identify optimization targets and improve overall integrity of your module.

Future Plans
For display/ui testing, there are plans to integrate and automate selenium test code generation as PHPUnit already has some form of support for this.

Stay tuned for progress.