|
Fast,
Easy Delta Reporting with SortCL
Because of CoSORT's fast, low-cost
file-system approach to data warehouse integration and staging --
its ability to simultaneously filter, sort, report, and speed DB
loads -- SortCL can be a very simple and cost-effective way to
capture and report on changed data in either a standalone or
integrated framework.
The following SortCL
application is an example of Insert, Update, Delete (i.e. Delta)
Processing:
Based on customer data that changes on a daily basis, a full outer
join is used to produce a report that shows differences between the
old and new sets of data. The "old" and "new"
files are both pre-CoSORTed by Account (Acct). The final report puts
a
change tag in the last column of each record:
Insert -- where a
new record exists for a given Acct value where an old record did not
(that is, a non-matched record). Update -- where the Amount (Amnt) value of the
old and new matching records had changed. Delete -- where an old record
exists for a given Acct value where a new record does not (that is,
a non-matched record). No change -- where
the Amnt value of the old and new record are the same (requires
a matching record).
Shown side-by-side
below are small subsets of
the pre-sorted input files old and new:
AC450
271 AC450 238
AC451 269 AC451 954
AC452 270 AC454 269
AC453 298 AC455 281
AC454 269 BNZ21 322
AC455 281 BNZ22 150
For this example,
assume that the format for the pre-sorted file new is the
same, but the data values are not; some match the old file
values and some do not.
The following SortCL script includes
a full outer join and a conditional field, delta_flag, that is
appended as the last column in the output:
/INFILE=old
/FIELD=(Acct,POS=1,SIZE=5)
/FIELD=(Amnt,POS=7,SIZE=3)
/CONDITION=(old_exist, \
TEST=(old.Amnt NE "
"))
/CONDITION=(old_nonexist, \
TEST=(old.Amnt EQ "
"))
/INFILE=new
/FIELD=(Acct,POS=1,SIZE=5)
/FIELD=(Amnt,POS=7,SIZE=3)
/CONDITION=(new_exist, \
TEST=(new.Amnt NE " "))
/CONDITION=(new_nonexist, \
TEST=(new.Amnt EQ "
"))
/CONDITION=(update, \
TEST=(old.Amnt NE new.Amnt))
/CONDITION=(no_update, \
TEST=(old.Amnt EQ new.Amnt))
/JOIN
FULL_OUTER old new WHERE \
old.Acct EQ
new.Acct
/OUTFILE=delta.out
/FIELD=(old.Acct,POS=1, SIZE=5)
/FIELD=(old.Amnt,POS=7,
SIZE=3)
/FIELD=(new.Acct,POS=11,SIZE=5)
/FIELD=(new.Amnt,POS=17,SIZE=3)
/DATA="
"
/FIELD=(delta_flag,POS=21, \
IF update AND old_exist \
AND new_exist THEN
"Update"\
ELSE IF old_nonexist AND\
new_exist THEN
"Insert" \
ELSE IF old_exist
AND \
new_nonexist THEN
"Delete" \
ELSE "")
# No change
Here is a subset of
the output file, delta.out::
AC450 271 AC450 238 Update
AC451 269 AC451 954 Update
BNZ21 322 Insert
BNZ22 150 Insert
AC452 270 Delete
AC453 298 Delete
AC454 269 AC454 269
AC455 281 AC455 281
The
full outer join produces the results of the right outer join
(Insert) and the left outer join (Delete). The full outer join
also returns the matches, where the Update and " " (empty) cases
are determined by comparing the Amnt values of the matching
records.
Of course, this example is greatly
simplified in terms of data content and file formatting, but larger
sources and larger, more detailed (and aggregated) output is always possible.
Please email your questions to support@iri.com.
|