Introduction to Preference SQL

Preference SQL Syntax Block

A Preference SQL query block has the following schematic design:

SELECT <projection, aggregation>
FROM <table_reference>
WHERE <hard_conditions>
PREFERRING <soft_conditions>
GROUPING <attribute_list>
BUT ONLY <but_only_condition>
TOP <number>
GROUP BY <number>
HAVING <aggregating_hard_conditions>
ORDER BY <attribute_list>
LIMIT <number>
 

Preference SQL supports a rich repertoire of intuitive preference constructors inside the PREFERRING clause.

Base preference constructors

  • categorical domains: POS, NEG, LAYERED, ...
  • numerical domains: SCORE, AROUND, ...
  • full-text domains: CONTAINS
  • spatial domains: NEARBY, ONROUTE, ...
  • temporal domains: EARLIEST, BETWEEN, ...

Complex preference constructors

  • Pareto, Skyline: AND
  • Prioritization: PRIOR TO
  • Ranking with utility function: RANK
  • Reverse preference order: DUAL

Preference construction is closed under strict partial orders. Additional constructors can be added as required.

For a full syntax we refer to Preference SQL Syntax.

PREFERRING

The PREFERRING clause expresses the application of a preference to the outcome of queries. A preference can be constructed by the preference constructors presented below. These preference constructors are evaluated on the result of the hard constraints stated in the WHERE clause, returning the BMO-set of a preference P. Empty result sets can only occur in cases where all tuples have been filtered out by the hard conditions. Note that in this way Preference SQL implements a generalization of constrained Skyline queries EK11.

All preferences can be used with 'regular' oder 'trivial' SV-semantics. 'regular' uses regular SV-semantics, otherwise trivial semantics, cp. Kie05. In general, using trivial SV-semantics means that one value may be better than, equal to, worse than, or incomparable to another value given in a preference, but not substitutable. Therefore, two values are only substitutable, if they are equal.

Using only base preferences, this makes no difference to the result set. The difference occurs when such preferences are combined to complex preferences. When, e.g. in a LAYERED preference, values of the same layer are incomparable, the value of this layer alone is not suffcient to determine domination. So using regular SV-semantics allows to daclare values 'substitutable' or 'equally good'. See Kie05 for a detailed description of regular and trivial SV-semantics. In doubt, always use the keyword REGULAR after each preference. This is the more intuitive preference evaluation type.

An example for a preference is:

SELECT * FROM car PREFERRING price LOWEST
 

This query retrieves all cars which have the lowest price in the data set.

GROUPING

The GROUPING clause confines the values among that a preference looks for the best to those identical in the grouping attributes. One does not get the overall best values but the best among groups defined by partitioning the tuples according to their values for the grouping attributes. The GROUPING clause has either the same argument list as the standard GROUP BY clause or contains a Grouping-SV specification. Consider an example for Grouping on a single attribute:

SELECT * FROM car PREFERRING price LOWEST GROUPING make
 

An example for a SV-Grouping query, where cars are grouped w.r.t. to their manufacturers:

SELECT manufacturer, price
FROM car
PREFERRING price LOWEST
GROUPING make SV ( ('Mercedes', 'Smart', 'AMG') AS 'Daimler', ('VW', 'Audi', 'Bugatti') AS 'Volkswagen') AS manufacturer
 

See the Grouping specification for more details on this issue.

BUT ONLY

Sometimes the filtering effect of a preference is not sufficient, e.g. because many tuples are incomparable, leading to a very large result set. One possibility to reduce the number of results is to use a WHERE clause leading to so called constrained Skyline queries, where the preference term is evaluated after the hard selection. For example:

SELECT * FROM car WHERE price <= 18000 PREFERRING age AROUND 4
 

The BUT ONLY clause of a query closely resembles the WHERE clause. However, the BUT ONLY clause is evaluated after the PREFERRING / GROUPING clause, i.e., only tuples that have passed the filter of the PREFERRING clause can be selected by the BUT ONLY clause. This is also the semantics of Skyline queries with constraints EK11. For example:

SELECT * FROM car PREFERRING age AROUND 4 BUT ONLY price <= 18000
 

The next example shows different result sets by exchanging the WHERE clause by the BUT ONLY clause, let's take a look at the following car relation:

name price fuel
car144 12000 4.0
car136 8000 6.0
car235 5000 10.0

Case 1: Evaluation of the WHERE Clause BEFORE the Preference Clause.

SELECT name, price, fuel
FROM car
WHERE price < 10000
PREFERRING fuel LOWEST
 

Leads to the following result set:

name price fuel
car136 8000 6.0

Case 2: Evaluation of the BUT ONLY Clause AFTER the Preference Clause leads to an empty result set.

SELECT name, price, fuel
FROM car
PREFERRING fuel LOWEST
BUT ONLY price < 10000
 

Quality functions may be used in the BUT ONLY clause to further reduce the result set by guaranteeing good quality levels or small distances.

TOP k

Since customers are accustumed to get plenty of results the result size of the BMO evaluation may be too tiny to meet their expectations. As an issue, the next best tuples are generated by an iterative multi-level BMO evaluation as being described by EP15. Thus, the TOP clause garantees the delivery of k tuples if the tuple size is not less than k after the evaluation of the WHERE clause. If the BMO set is greater than k then only k tuples are returned. If the BMO set is less than k the next best tuples are generated by a multi-level BMO evaluation until the restriction is fulfilled that For example:

SELECT * FROM car PREFERRING age AROUND 4 TOP 5
 

GROUP BY / HAVING / ORDER BY

The GROUP BY / HAVING clause is used for presentation of the result tuples retrieved by preference selection. Preference evaluation on grouped tuples have to be done in the GROUPING clause. The ORDER BY clause is used to order the tuples w.r.t. an attribute.

LIMIT

Sometimes it is desirable to limit the amount of retrieved tuples, e.g. because one wants to present the result on a mobile device. If a LIMIT <number> is given, at maximum <number> rows will be returned. For example:

SELECT * FROM car WHERE price <= 18000 PREFERRING age AROUND 4 LIMIT 3
 

will return 3 rows; it will return less if the rest of the query does.

The LIMIT clause is evaluated after the ORDER BY clause, i.e., if no ORDER BY clause is given, LIMIT will return an unpredictable subset of the query's rows.

The semantics of Preference SQL evaluation

Preference SQL follows the best matches only (BMO) semantics (Kie02). That means, when a database is queried on base of some user preference, those tuples that best matches the user's wishes are returned.

Best Matches Only (BMO)
For a preference P on a database schema R preference selection is defined as

Preference selection retrieves all maximal values t from the instance of a database relation R. If none exists, it delivers best matches alternatives, but nothing worse. The result of preference selection (also named "winnow"-operator in Cho03) is called "BMO-set".

Preference selection is defined by means of negations, hence it defines a form of non-monotonic reasoning. It offers a cooperative query answering behavior by automatic matchmaking: The BMO query result adapts to the quality of the data in the database, defeating the empty result effect and reducing the flooding effect by filtering off worse results. Thus preferences are treated as soft constraints.

Example

The following table presents a database relation which contains some cars.

ID Make Color Price Age Horsepower Fuel
1 vw silver 16000 2 80 5.2
2 vw white 11000 5 75 7.7
3 toyota red 7000 8 80 6.4
4 toyota blue 3000 12 65 5.5
5 honda black 20000 1 80 5.0

The Pareto preference:

SELECT * FROM car PREFERRING age AROUND 4 AND horsepower BETWEEN 80 AND 100
 

leads to the following BMO-set:

ID Make Color Price Age Horsepower Fuel
1 vw silver 16000 2 80 5.2
2 vw white 11000 5 75 7.7

Both tuples are better than all other tuples. Car 1 and Car 2 are equally good. Car 2 is better concerning the age, but worse concerning the horsepower in comparison to Car 1.

Using preference selection we can define the semantics of preference evaluation as follows:

  • T_1:=σ_H(R_1×⋯×R_n)
    H denotes a hard constraint on the Cartesian product of n relations R
  • T_2:=σ[P](T_1)
    A preference P is evaluated on the result T_1.
  • T_3:=π(σ_{but only}(T_2))
    The BUT ONLY filter allows to present only results of T_2 passing this filter.