Monday, August 18, 2008

Pre-Master Final Project (Reverse Query Processing)


On Sep 2007, I joined faculty of computers and information at Cairo university for achieving master degree on computer science. This year was full study stage for 6 courses which are:

  1. Information System Development Methodologies
  2. Natural Language Processing
  3. Object Oriented Database
  4. Databases Systems
  5. Data Mining
  6. E-Commerce

with these courses there is a final project to pass for master registration.
I started to survey about opened topics on computer science fields like

  • Parallel and Distributed Computing
  • Requirements Engineering
  • Software Engineering
  • Web Based Decision Support Systems
  • Advances in Data and Knowledge Engineering
  • Data Mining and Computer Modeling in Tourisim
  • Health and Biomedical Informatics
  • Multimedia Computing
  • Natural Language Processing
  • Networks and Information Security
  • ....etc

and to continue proofing that TOPIC for my master degree. after surveying about these fields and opened topics on them, I found set of topics that match my hobbies in software engineering like:

  • Reverse Query Processing
  • Morphological Analysis and generation
  • Text and web content mining
  • Model-Driven DSS
  • Radiology structured-reporting
  • Gene sequence Annotation
  • Domain-Specific RE Processes
  • ...etc

after that, I decided to start on Reverse Query Processing under supervision of Dr. Ali El Bastawesy , the following are some notes after finalizing and discussing the project.


Nowadays, there are a lot of techniques used for testing a database management system (DBMS) by generating a set of test databases and then execute queries on top of them. However, for DBMS testing, it would be a big advantage if we can control the input and/or the output (e.g., the cardinality) of each individual operator of a test query for a particular test case. RQP gets a query and a result as input and returns a possible database instance that could have produced that result for that query. RQP also has other applications such as testing the performance of DBMS and debugging SQL queries. There are a number of commercial tools to automatically generate test databases. These tools take a database schema (table layouts plus integrity constraints) and table sizes as input in order to generate new database instances with tuples.

Areas of RQP:

- Database Testing
- Software quality assurance

RQP Applications:

1- Generating Test Databases

The application that started this work is the generation of test databases for regression tests or to test the specification of an application. If the application code is available (e.g., Java or C# with embedded SQL), then the application code can be analyzed using data flow analysis in order to find all code paths. Based on this information, RQP can be applied to the SQL statements which are embedded in the application in order to generate a test database that will provide data for all possible code paths.

For example, consider an application with an if-else block where the if condition relies on the result R of a query Q. Given that query Q and different results R (e.g. one R for each branch of the if-else block), RQP can generate different databases to test all code paths of that application (R can be given by the testers manually or by some code analysis tools

foreach price in SELECT price FROM Product do
if(price>=0 && price<=10)
//do something
else if(price>10) //do something else
end foreach

2- SQL Debugger

3- Program Verification

4- Database Sampling, Compression

Some databases are large and query processing might be expensive even if materialization and indexing is used. One requirement might be to provide a compressed, read-only variant of a database that very quickly gives approximate answers to a pre-defined set of parameterized queries


Problem Definition:

The problem of database application testing can be broadly partitioned into the problems of test cases generation, test data preparation and test outcomes verification. Among the three problems, the problem of test cases generation directly affects the effectiveness of testing.

Given an SQL Query Q, the Schema SD of a relational database (including integrity constraints), and a Table R (called RTable), find a database instance D such that: R = Q(D) and D is compliant with SD and its integrity constraints.

prob def

RQP Architecture


Multi Reverse Query Processing:

RQP is not capable to support multiple queries and the corresponding expected results as input. Thus, in [6] we studied the problem of Multi-RQP (or MRQP for short). Unlike RQP, MRQP gets a set of SQL SELECT queries, the corresponding expected query results and a database schema as input and tries to generate one test database that returns the expected results for all the given queries


here, I tried to describe the topic briefly so please don't hesitate to contact me for more information



By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 0
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1
By: Favoshots
Views: 1