Skip to content

Boundary-case data coverage for numeric predicates #31

@tohpinren

Description

@tohpinren

I have a clarification question about how boundary conditions in numeric predicates are intended to be handled in BIRD, and whether adding minimal boundary-case data would be desirable.

For Question 336, the gold SQL query is:

SELECT (
    CAST((
        SELECT COUNT(*)
        FROM posts
        INNER JOIN users ON posts.OwnerUserId = users.Id
        WHERE users.Reputation > 1000
          AND strftime('%Y', users.CreationDate) = '2011'
    ) AS FLOAT)
    /
    CAST((SELECT COUNT(*) FROM posts) AS FLOAT)
) * 100 AS percentage

If this predicate is relaxed to:

users.Reputation >= 1000

both queries currently return the same result on the existing database:

[(11.153034817215058,)]

This appears to be because there is no user with Reputation = 1000 whose CreationDate falls in 2011 and who owns at least one post, making the semantic distinction between > 1000 and >= 1000 unobservable under execution.
If a single boundary-case row satisfying these conditions (i.e., Reputation = 1000, CreationDate in 2011, and owning at least one post) were added, the two queries would return different results, making the distinction observable.

Question:
Would it be appropriate to add such a boundary-case row to the database to distinguish these two queries, and if so, should this be done via a small PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions