Friday, March 23, 2012

Many Year Payments

I need to create a stored procedure that will show all clients that have made payments every year for a number of years (3) or (5). My tables are simple but I have no clue how to ask the SQL question, mostly because it is each and every year.

tblClients ClientID (pk)
tblOrders OrderID (pk) ClientID (fk)
tblOrderPayments PaymentID (pk) OrderID (fk)

Thanks for any help,Depends on how you would like the output. Also: In your tables there's no date. And is not possible for one payment to apply to many orders?

And before I start: You do not 'ask SQL questions', rather you 'construct SQL queries'. And lastly: Always use singular names for your tables, e.g. Client and not Clients, since all tables can hold multiple records all of your table names end up being plural and so the plural becomes redundant.

Anyway. I'll use some common sense to fill in the gaps in your question: I'll assume the payment date rather than the order date is the important one, and I'll assume therefore there's a date on the payment table. Also I'll assume for a moment that your DB design is correct although I doubt it, since it's almost certainly the case that several orders can be paid with one order. I do not know of businesses that does not allow you to pay off several orders in one payment.

So your question as stated is like follows: You want a list of all clients that's made at least one payment a year for the last X years. Is that right? Sounds like a weird question for a business to ask...

If this is indeed what they want, it's a rather interesting SQL problem. I'll give you a few solutions just for my own amusement, even though I suspect they'll be useless for you because your DB design is wrong and you probably misunderstood the needs of your management.

Here's my first attempt:


set @.EndYear = 2004

select CustomerID,
CustomerName
from Customer C
where CustomerID in (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear)
and CustomerID in (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear - 1)
and CustomerID in (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear - 2)


It will work for the 3 year case, but sucks bigtime. If you have a small database (less than 1 million orders) then this will be the way to go since it's easy to understand and you probably won't notice and bad performance. If one of my guys wrote this query for me I'd not be impressed though.
It sucks because:
- Only works for a set number of years. You'll have to write 2 of them to cater for 3 or 5 year queries
- Uses IN with large sets, several times. Bad for performance.
It's not all bad, though. It's cool because:
- It's simple to understand for anyone out there
- Does not use cursors or temp tables

Here's a slightly better one:


set @.EndYear = 2004

select CustomerID,
CustomerName
from Customer C
join (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear) as Y1
on C.CustomerID = Y1.CustomerID
join (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear - 1) as Y2
on C.CustomerID = Y2.CustomerID
join (select distinct CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear - 2) as Y3
on C.CustomerID = Y2.CustomerID


The same as the first one, but without the expensive IN statements. Still uses a lot of DISTINCTs though, and you still have to write 2 queries for the 3 and 5 year cases. Let's extrapolate this query to one that will work for any number of years from 1 to 5:

set @.EndYear = 2004
set @.NoYears = 4

select CustomerID,
CustomerName
from Customer C
join (select CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear) as Y1
on C.CustomerID = Y1.CustomerID
join (select CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and (datepart(year, P.Date) = @.EndYear - 1
or @.NoYears < 2)) as Y2
on C.CustomerID = Y2.CustomerID
join (select CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and (datepart(year, P.Date) = @.EndYear - 2
or @.NoYears < 3)) as Y3
on C.CustomerID = Y3.CustomerID
join (select CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and (datepart(year, P.Date) = @.EndYear - 3
or @.NoYears < 4)) as Y4
on C.CustomerID = Y4.CustomerID
join (select CustomerID
from Order O
join Payment P
on O.OrderID = P.OrderID
and (datepart(year, P.Date) = @.EndYear - 4
or @.NoYears < 5)) as Y5
on C.CustomerID = Y5.CustomerID


Whoa! That's a good one eh? Again, it will work, but I'll probably get blasted by the community here for even thinking about posting **** like that. The query will work for any number of years to check from 1 to 5, so you can do your 3 and 5 year queries on it just fine without the need to write 2 seperate queries.

Ok, enough of that. Now let's try another approach: Instead of doing it all in one SQL statement, we'll create a temp table with all CustomerIDs in it. Then we'll iterate through the years we want to check, and remove all customers that did not make a payment in each year. So in the end we're left with only the customers that made payments in all the years checked.


set @.EndYear = 2004
set @.NoYears = 4
set @.I = 0

select CustomerID
into #Temp
from Customer

while (@.I < @.NoYears)
begin

delete T
from Order O
join Payment P
on O.OrderID = P.OrderID
and datepart(year, P.Date) = @.EndYear - @.I
right join #Temp T
on O.CustomerID = T.CustomerID
where P.OrderID is null

set @.I = @.I + 1

end

select C.CustomerID,
C.CustomerName
from Customer C
join #Temp T
on C.CustomerID = T.CustomerID


This is also a nice and simple solution, and will probably give you fairly decent performance. The good thing about it is that it will work for ANY number of years, although the more years you do the longer it will take. It's also sexy because it uses a RIGHT JOIN, my favourite type of join. :)

I'm going to stop here, but there are almost certainly more elegant solutions out there. I can imagine one where you use a helper table for all the years you want to check, and then do a distinct and a having count on a simple join in a sub-query - it will work in a single query and will be able to do an arbitraty number of years. I'll challenge anyone else out there to give it a shot and I'll post it in a day or two.|||Hi i think i found a nice solution for you :).

Data: i used the Northwind database for testing, i created a payments table containing customerid, orderid and paymentdate and populated it with testdata using the orders table.


DECLARE @.YearSpan int
SET @.YearSpan = 3

SELECT
YearsOfPayment.CustomerID,
Sum(FirstYear) As FirstYearPayments,
Sum(SecondYear) As SecondYearPayments,
Sum(ThirdYear) As ThirdYearPayments,
Sum(FourthYear) As FourthYearPayments,
Sum(FifthYear) As FifthYearPayments

FROM
(
Select
Customers.CustomerID,
(
Case When Payments.Paymentdate Between '01/01/1996' And '12/31/1996' Then 1 Else 0 End
) AS FirstYear,
(
Case When Payments.Paymentdate Between '01/01/1997' And '12/31/1997' Then 1 Else 0 End
) AS SecondYear,
(
Case When Payments.Paymentdate Between '01/01/1998' And '12/31/1998' Then 1 Else 0 End
) AS ThirdYear,
(
Case When Payments.Paymentdate Between '01/01/1999' And '12/31/1999' Then 1 Else 0 End
) AS FourthYear,
(
Case When Payments.Paymentdate Between '01/01/2000' And '12/31/2000' Then 1 Else 0 End
) AS FifthYear
From
Customers
Inner Join Payments On Payments.CustomerID = Customers.CustomerID
) AS YearsOfPayment

GROUP BY
YearsOfPayment.CustomerID

HAVING
SUM(FirstYear) > 0
AND SUM(SecondYear) > 0
AND SUM(ThirdYear) > 0
AND SUM(FourthYear) > (CASE WHEN @.YearSpan > 3 THEN 0 ELSE -1 END)
AND SUM(FifthYear) > (CASE WHEN @.YearSpan > 4 THEN 0 ELSE -1 END)

Todo:
You have to replace the hardcoded dates with a @.FirstYear Parameter and modify the WHEN
clauses accordingly ;)

Good things here:
- if you need to check for more years, just modify it to return an additional CASE column
- performance won't change with the number of years checked (= the number of CASES you return since no subselects are involved

Further Tuning:
- to speed it up a even little more you can create and index on the paymentdate and put a WHERE statement in the "inner" SELECT to only check the years between @.firstdate and @.firstdate + @.yearspan

- Moon|||:: And before I start: You do not 'ask SQL questions', rather you 'construct SQL queries'.

... and before you post again, you should rethink your arrogant attidue and stop correcting someone since most people will find this very offensive.

:: And lastly: Always use singular names for your tables, e.g. Client and not Clients,
:: since all tables can hold multiple records all of your table names end up being plural
:: and so the plural becomes redundant.

... from OO point of view where you think of collections and items, a table is a collection and holding items (records) and therefore it is 100% valid to use plural.|||Moon's solution is novel, but only works for up to a certain number of years. Here's my solution that will work for an arbitrary number of years, with only one query and no temp tables or cursors:


set @.LastYear = 2004
set @.NumYears = 5

select D.CustomerId,
C.[Name]
from Customer C
join (select distinct O.CustomerId,
datepart(year, P.[Date]) as PaidYear
from [Order] O
join Payment P
on O.OrderId = P.OrderId
and datepart(year, P.[Date]) <= @.LastYear
and datepart(year, P.[Date]) > @.LastYear - @.NumYears) as D
on C.CustomerID = D.CustomerID
group by D.CustomerID,
C.[Name]
having count(*) = @.NumYears

Who's your daddy! $5 for anyone with a more elegant solution.
And if I come off as arrogant, feel free to skip my comments and just look at my code.|||Relational databases are NOT OO and therefore your plural argument does not hold.

From IDEF1X

December 21, 1993

Section: 3.1.2 Entity Syntax

...The entity name is a noun phrase that describes the set of things the entity
represents. The noun phrase is in singular form, not plural. Abbreviations and
acronyms are permitted, however, the entity name must be meaningful and consistent
throughout the model...

IE and other data modeling standards organisations agree with the above statement. Also I was taught the same thing at University in my first year. I was under the impression that this is common knowledge.|||*points at the year, cough, :P*

well as long as Microsoft uses the plural convention in their own databases: master, Northwind, Pubs, ... i imply that there is a point in using plural for his own database objects as well ;)

Also i didn't say that using singular is a complete NoNo, just that using plural is not forbidden but valid to use.

PS: University != real life ;). I also was taught a few things in university such as C++ that are invalid by now - since when i learned C++, it was still a draft, without namespaces, and whatnot and if i would need to code in C++ today it would be a nearly a full restart.|||Yeah this one is nice :)

And sorry my yesterdays comment was rude as well :S.|||Like I said: If you start naming tables with plurals, then in the end ALL your tables (or at least a very large percentage) will end up being plural, and so the 's' at the end of every table becomes just another letter you have to type in every line of SQL code and serves no purpose whatsoever. Remember that 99% of all tables will have more than one row and therefore should in your argument be plural in the naming.

And who ever said MS wrote good code, eh?

The date may be 10 years old, but please feel free to provide me with a more recent reference from an SQL standards authority that says 'Yea, we made a mistake in '93, go ahead and use plurals now'

I disagree with you. There's no good reason to use plurals. Using plurals is a nono. Except if you're MS in which case you are expected to write bad code and thus are allowed plurals in table names.

No comments:

Post a Comment