In
this issue:
|
| Latest
CoSORT Release |
Version
8.2.2 is Here
IRI
has again updated CoSORT for UNIX and Windows. Version 8.2.2: improves contiguous key sort performance; helps
system administrators manage sort memory; updates MVS2SCL,
introduces or improves on SortCL
functionality: sub-strings, search and replace, and field length sensing.
8.2.2 is a minor update to 8.2.1 and
renames the cob/csv/ctl/elf2scl converters
to cob/csv/ctl/elf2ddf. All
SortCL executable changes are also reflected in gui2scl
and the sortcl_routine()
API.
Click
here to contact IRI and get access to the 8.2.2 readme
file that contains the confidential list of developments since 8.1.3.
Ask about CoSORT's tie-in to FAst
extraCT for Oracle, and IRI's new data synthesizer and full file/format
simulator, RowGen.
|
|
| Latest
CoSORT Certification |
| Novell
'YES' Certifies Linux CoSORT on SLES 9
Soon after
CoSORT became one of the first commercial applications ported
to SUSE LINUX Enterprise Server (2.6 kernel) on IBM's
new POWER5 i and pSeries platforms, Novell contacted IRI to
formally certify CoSORT for SLES 9. As further testament to IRI's
now 20-year commitment to UNIX standards, CoSORT met Novell's
stringent YES certification requirements on yet another SLES9
platform -- 64-bit Itanium (HP RX1600).
Linux CoSORT is available for SLES on IBM i, p and zSeries mainframe
platforms, as well as Intel x86 and Itanium processors. CoSORT is
also available and proven under Red Hat Linux (7-9) on IBM and Intel
systems, as well as other open (and closed) source UNIX derivates --
even FreeBSD. Click
here for the platform list.
|
|
| RowGen
> SQL*Loader = Large Tables Fast |
IRI's
newest data management product -- RowGen
-- provides a fast and easy way to simultaneously synthesize custom files
and populate huge tables. The example below shows how to stream data
directly from RowGen into
a database or other jobs. By
simultaneously generating and piping data to the next application, you do not have to wait for
files to close and reopen before the next process -- just as
in CoSORT's Oracle ETL example piping fact
| sortcl | sqlldr!
The example also shows how you
to use existing SQL*Loader control file statements for RowGen-supported /FIELD layouts.
IRI's free ctl2ddf utility creates a data definition
file from Oracle columns for central reference use by RowGen (or SortCL)
users.
Consider an empty Oracle table,
inventory, which you might populate nightly with new data.
Its description is:
Name
Type
--------- ----------------
CODE CHAR(5)
PRICE NUMBER(6,2)
PART VARCHAR2(12)
Its control file, stream_in.ctl, is used to load the data into the table:
LOAD DATA
INFILE 'outstream.dat'
TRUNCATE
INTO TABLE inventory
TRAILING NULLCOLS
(code position(0001:0005) char,
price position(0006:0012) DECIMAL EXTERNAL,
part position(0013:0024) char)
To generate the equivalent layouts for RowGen use, run the included
'ctl2ddf' utility: ctl2ddf
stream_in.ctl
This produces a re-useable RowGen metadata repository:
/FILE=outstream.dat
/FIELD=(code, POS=1, SIZE=5)
/FIELD=(price, POS=6, SIZE=7.2, NUMERIC)
/FIELD=(part, POS=13, SIZE=12)
As you can see:
The name inventory.ddf came from the LOAD table entry in the control file.
The .ddf's /FILE entry uses the INFILE name from the control file.
To generate more meaningful data, we'll change one of the field statements in
inventory.ddf to reference SET file values instead. That is, change the part field statement to the following:
/FIELD=(part,SET=pts.set,POS=13,SIZE=12)
If you do not make this change, the example will run, but the part
field would consist of randomly-generated (vs. selected) ASCII characters.
The RowGen job script, stream_out.rcl, relies on the .ddf FIELD
layouts:
/SPEC=inventory.ddf # use central layouts
/INFILE=real_file #
future placeholder
/INCOLLECT=15000000 # generate
15M rows
/OMIT WHERE price < 0 # no negative values
/KEY=code
# query field sort
/OUTFILE=outstream.dat # feeds SQL*Loader
This script will generate 15 million records to load, ordered
by 'code.' Many more output files in different sizes and
formats could have been created at the same time. The input file placeholder is there in case you will run
the compatible SortCL
application on real input data in the future!
Single-pass
your data generation and Oracle load via stream.bat:
mkfifo outstream.dat
rowgen /spec=stream_out.rcl |
sqlldr control=stream_in.ctl DIRECT=TRUE
RowGen and SQL*Loader processes begin simultaneously, so the entire pre-sorted
load happens very quickly. To verify the results, check the contents of the
new table in Oracle 9i, which now has 15 million rows:
>SELECT * from
inventory WHERE rownum < 5;
CODE PRICE PART
----- ---------- ------------
AAACJ 6277.37 switches
AAAKO 5332.25 sanders
AAAwB 2312.32 plyers
AABCM 7628.62 lightbulbs
With just
the first four rows shown, you can see that the:
"CODE" field is ordered alphabetically
"PRICE" field contains no negative values
"PART" field contains randomly selected values from a set
file.
A multi-CPU RowGen license will improve sort and load performance.
|
|
| About
CoSORT and The CoSORT Journal |
- CoSORT solutions
serve data warehouse (ETL)
architects, very large database (VLDB)
administrators, mainframe sort migrators,
and developers and independent software vendors (ISVs) building
faster sorting and data transformation into their applications.
- CoSORT delivers the IT
industry's fastest UNIX sort engine and one of its most powerful
flat-file manipulation and reporting programs: SortCL,
which combines: row filtering and conditional selection,
sort/merge and joins, drill-down aggregation and cross-row
calculation, conversion and generation of more than 100 data
types, database sequencing, and multi-target, multi-level output
reformatting for reports,
hand-offs, and DB load utilities.
- Other special features
include: coroutine sort architecture; fully tunable and scalable
parallel sort performance
on all multi-CPU UNIX and Windows servers; cross-calculation on
aggregated values and aggregation on cross-calculated values,
cross-table joins (matching) integrated with data conversion and
expression logic; multinational date and timestamp support; a
cross-platform Java GUI;
and, e-commerce
reporting via CSV/CLF and IP Address manipulation, as well as
ELF/HTML input/output, respectively.
- CoSORT also has
plug-n-play replacements
or parameter converters for sorting in: ACUCOBOL-GT,
Amdocs Ensemble (telecom billing) Ascential
DataStage; Informatica
PowerCenter and PowerMart; Cincom Supra; IBM's
DB2 loader and MVS/VSE
sorts; MF
COBOL Workbench, Net and Server Express; SAS
System; Software AG Natural;
Sun
MRP; and, UNIX
SVR4 (/bin/sort).
- IRI has begun to offer
other data manipulation and management solutions like: FACT
for fast unloads from Oracle; netCONVERT
for mainframe data conversion and reformatting; x-PRESS
for fast, and secure data compression and decompression; RowGen
for custom-formatted random file generation; Logon
for controlling and auditing access to UNIX systems; and Permitas
for licensing and activating software applications.
- The CoSORT
Journal is a quarterly Email newsletter designed to keep
subscribers up-to-date on salient news and events at CoSORT/IRI,
Inc. Past newsletters are archived for 2 years here.
|
Copyright ©
2005 Innovative Routines International (IRI), Inc. All rights reserved. CoSORT, SortCL,
Permitas, and Rowgen are
trademarks of IRI. FACT is a trademark of CoSORT Korea,
Ltd. All other names mentioned herein may be copyrights or registered copyrights of their respective owners.
To
remove an Email address from future CoSORT Journal mailings,
please email
news@iri.com. To
subscribe, please provide your contact details here. To contact an IRI
agent for assistance, call 1-800-333-SORT, or email cosort@iri.com. To find the IRI agent in your
country, click here.
|
|
Question: What's
Fast, Cost-Effective ETL?
Ever
since Database Trends Magazine declared CoSORT 'The Emerging ETL Engine'
in 1999 -- and now amid growing interest in IRI's piped
Oracle ETL solution; i.e. fact
| sortcl | sqlldr -- the goal of fast but affordable VLDW ETL has
driven IRI development.
You can use IRI tools in a variety of ways
to make light work of high-volume ETL operations, and uniquely optimize
and/or combine these individual elements in a simple, inexpensive
way.
If you
run Oracle on UNIX, you're especially in luck:
How IRI's Product Combo is an Oracle ETL Solution:
CoSORT's FAst extraCT (FACT)
product for Oracle rapidly Extracts tables
to faster flat-file formats. FACT can also pipe this data to, and create the
extract's metadata for, CoSORT's Sort Control Language (SortCL)
program and SQL*Loader. SortCL then runs one or more simultaneous (or
standalone/batch) Transformation jobs and reports.
Multiple filtered, reformatted and CoSORTed output targets include:
files, reports, and pipes to Loading tools
like Oracle's SQL*Loader.
This approach is simple, less expensive, and faster than other ETL tools
-- (reusable) metadata build automatically and the jobs run outside the
DB!
How
IRI Tools Perform & Speed ETL Jobs Generally:
1) CoSORT's FACT performs rapid Extractions.
2) CoSORT's SortCL
UIs perform, speed, and
combine Transformations in 1
pass like:
Select-Sort-Join-Calc-Convert-Aggregate-Remap
3) CoSORT speeds bulk Loads by pre-sorting.
4) CoSORT's SortCL
sets up and runs faster
than equivalent ETL or PL/SQL procedures.
5) Flat files are the fastest way to stage very
large volumes of warehouse and ODS data.
If
you already use a conventional ETL tool, CoSORT will make its high-volume
operations more efficient:
How
IRI Technologies Speed Other ETL Products:
1) Easy metadata interface to ETI*Extract
2) Dual-mode PlugIn for DataStage
XE sort
3) Plug'n'Play AEP for Informatica
Sorter Tx
4) Direct drop-in sort replacements for SAS,
Software AG Natural,
and top COBOL
tools.
And
remember, if you also need to produce reports in or outside of the ETL
context, CoSORT's Sort Control Language (SortCL) program is a
full-function, custom report
generator:
Please
click
here to link to the details, or contact
us with any questions or comments.
|
| Tech
Tip |
| Using
SortCL to Rank Data
CoSORT Sort Control Language (sortcl)
program supports a wide range of data aggregation and
expression functions. Did you know that the count feature could also
be used for ranking?
Suppose you want to rank salespeople
by the value of sales they have made, with the highest sales ranking
first. The following file contains names of salespeople and their
associated sales (in thousand dollar amounts).
Mary 23
Robert 129
John 345
Vanessa 31
Donald 345
Sarah 54
Laura 45
Henry 98
Richard 31
Tom 29
Nancy 18
Barbara 32
The sortcl script below ranks the salespeople
in order of sales made. Notice that the sort is in descending order
so that highest sales values are returned first. Also, the rank
field is declared as NUMERIC so that the values will be right-justified.
/INFILE=salesbyname
/FIELD=(person,POS=1,SIZE=11)
/FIELD=(sales,POS=13,SIZE=3)
/SORT
/KEY=(sales,DESCENDING)
/OUTFILE=sale_rank.out
/FIELD=(rank,POS=1,SIZE=2.0,NUM)
/FIELD=(person,POS=4,SIZE=11)
/FIELD=(sales,POS=15,SIZE=3)
/COUNT rank RUNNING WHERE sales
That script produces this output:
1 John 345
1 Donald 345
2 Robert 129
3 Henry 98
4 Sarah 54
5 Laura 45
6 Barbara 32
7 Richard 31
7 Vanessa 31
8 Tom 29
9 Mary 23
10 Nancy 18
Notice how
equal performers get equal rankings. For more information on using
sortcl for ranking, see the CoSORT 8.2.2 User Manual, or contact an
IRI engineer directly via support@iri.com.
|
|