The CoSORT Company respects your online time and privacy. You received this quarterly newsletter as elected or forwarded information related to data sorting, data warehousing, database administration, mainframe migration, and/or your work with CoSORT Sort/ETL solutions. To stop or start future quarterly mailings, click here or on the links at the bottom of this message. Please forward this newsletter to anyone interested in our company or the enterprise data processing marketplace.

 

             
The CoSORT Journal: Data Sorting and ETL News
                                                                                           Quarter 1, 2005
In this issue:


CoSORT 8.2.2 is Here
Novell Certifies Linux CoSORT
FAQ: Fast & Cost-Effective ETL?
Build Tables Fast with RowGen!
Tech Tip: Ranking 
Latest CoSORT Release
Version 8.2.2 is Here 

IRI has again updated CoSORT for UNIX and Windows. Version 8.2.2: improves contiguous key sort performance; helps system administrators manage sort memory; updates MVS2SCL, introduces or improves on SortCL functionality: sub-strings, search and replace, and field length sensing. 8.2.2 is a minor update to 8.2.1 and renames the cob/csv/ctl/elf2scl converters to cob/csv/ctl/elf2ddf. All SortCL executable changes are also reflected in gui2scl and the sortcl_routine() API.

Click here to contact IRI and get access to the 8.2.2 readme file that contains the confidential list of developments since 8.1.3. Ask about CoSORT's tie-in to FAst extraCT for Oracle, and IRI's new data synthesizer and full file/format simulator, RowGen
Latest CoSORT Certification

Novell 'YES' Certifies Linux CoSORT on SLES 9 
           Soon after CoSORT became one of the first commercial  applications ported to SUSE LINUX Enterprise Server (2.6 kernel) on IBM's new POWER5 i and pSeries platforms, Novell contacted IRI to formally certify CoSORT for SLES 9. As further testament to IRI's now 20-year commitment to UNIX standards, CoSORT met Novell's stringent YES certification requirements on yet another SLES9 platform --  64-bit Itanium (HP RX1600).

Linux CoSORT is available for SLES on IBM i, p and zSeries mainframe platforms, as well as Intel x86 and Itanium processors. CoSORT is also available and proven under Red Hat Linux (7-9) on IBM and Intel systems, as well as other open (and closed) source UNIX derivates -- even FreeBSD. Click here for the platform list.

RowGen > SQL*Loader = Large Tables Fast

IRI's newest data management product -- RowGen -- provides a fast and easy way to simultaneously synthesize custom files and populate huge tables. The example below shows how to stream data directly from RowGen into a database or other jobs. By simultaneously generating and piping data to the next application, you do not have to wait for files to close and reopen before the next process -- just as in CoSORT's Oracle ETL example piping fact | sortcl | sqlldr!

The example also shows how you to use existing SQL*Loader control file statements for RowGen-supported /FIELD layouts. IRI's free ctl2ddf utility creates a data definition file from Oracle columns for central reference use by RowGen (or SortCL) users.

Consider an empty Oracle table, inventory, which you might populate nightly with new data. Its description is:
    Name      Type
    --------- ----------------
    CODE      CHAR(5)
    PRICE     NUMBER(6,2)
    PART      VARCHAR2(12)

Its control file, stream_in.ctl, is used to load the data into the table:
    LOAD DATA
    INFILE 'outstream.dat'
    TRUNCATE
    INTO TABLE inventory
    TRAILING NULLCOLS
    (code position(0001:0005) char,
    price position(0006:0012) DECIMAL EXTERNAL,
    part position(0013:0024) char)


To generate the equivalent layouts for RowGen use, run the included 'ctl2ddf' utility:  
ctl2ddf stream_in.ctl 
This produces a re-useable RowGen metadata repository:
    /FILE=outstream.dat
    /FIELD=(code, POS=1, SIZE=5)
    /FIELD=(price, POS=6, SIZE=7.2, NUMERIC)
    /FIELD=(part, POS=13, SIZE=12)

As you can see:
The name inventory.ddf came from the LOAD table entry in the control file. The .ddf's /FILE entry uses the INFILE name from the control file. 

To generate more meaningful data, we'll change one of the field statements in inventory.ddf to reference SET file values instead. That is, change the part field statement to the following:
    /FIELD=(part,SET=pts.set,POS=13,SIZE=12)
If you do not make this change, the example will run, but the part field would consist of randomly-generated (vs. selected) ASCII characters.

The RowGen job script, stream_out.rcl, relies on the .ddf FIELD layouts:
    /SPEC=inventory.ddf    # use central layouts
    /INFILE=real_file      # future placeholder
    /INCOLLECT=15000000    # generate 15M rows
    /OMIT WHERE price < 0  # no negative values
    /KEY=code              # query field sort
    /OUTFILE=outstream.dat # feeds SQL*Loader

This script will generate 15 million records to  load, ordered by 'code.'  Many more output files in different sizes and formats could have been created at the same time. The input file placeholder is there in case you will run the compatible SortCL application on real input data in the future!

Single-pass your data generation and Oracle load via stream.bat:
    mkfifo outstream.dat
    rowgen /spec=stream_out.rcl |
    sqlldr control=stream_in.ctl DIRECT=TRUE

RowGen and SQL*Loader processes begin simultaneously, so the entire pre-sorted load happens very quickly. To verify the results, check the contents of the new table in Oracle 9i, which now has 15 million rows:
    >SELECT * from inventory WHERE rownum < 5;
    CODE  PRICE      PART
    ----- ---------- ------------
    AAACJ 6277.37    switches
    AAAKO 5332.25    sanders
    AAAwB 2312.32    plyers
    AABCM 7628.62    lightbulbs
With just the first four rows shown, you can see that the:
     "CODE" field is ordered alphabetically
     "PRICE" field contains no negative values
     "PART" field contains randomly selected values from a set file. 
A multi-CPU RowGen license will improve sort and load performance.

About CoSORT and The CoSORT Journal
  • CoSORT solutions serve data warehouse (ETL) architects, very large database (VLDB) administrators, mainframe sort migrators, and developers and independent software vendors (ISVs) building faster sorting and data transformation into their applications.
  • CoSORT delivers the IT industry's fastest UNIX sort engine and one of its most powerful flat-file manipulation and reporting programs: SortCL, which combines: row filtering and conditional selection, sort/merge and joins, drill-down aggregation and cross-row calculation, conversion and generation of more than 100 data types, database sequencing, and multi-target, multi-level output reformatting for reports, hand-offs, and DB load utilities.
  • Other special features include: coroutine sort architecture; fully tunable and scalable parallel sort performance on all multi-CPU UNIX and Windows servers; cross-calculation on aggregated values and aggregation on cross-calculated values, cross-table joins (matching) integrated with data conversion and expression logic; multinational date and timestamp support; a cross-platform Java GUI; and, e-commerce reporting via CSV/CLF and IP Address manipulation, as well as ELF/HTML input/output, respectively.
  • CoSORT also has plug-n-play replacements or parameter converters for sorting in: ACUCOBOL-GT, Amdocs Ensemble (telecom billing) Ascential DataStage; Informatica PowerCenter and PowerMart; Cincom Supra; IBM's DB2 loader and MVS/VSE sorts; MF COBOL Workbench, Net and Server Express; SAS System; Software AG Natural; Sun MRP; and, UNIX SVR4 (/bin/sort).
  • IRI has begun to offer other data manipulation and management solutions like: FACT for fast unloads from Oracle; netCONVERT for mainframe data conversion and reformatting;  x-PRESS for fast, and secure data compression and decompression; RowGen for custom-formatted random file generation; Logon for controlling and auditing access to UNIX systems; and Permitas for licensing and activating software applications.
  • The CoSORT Journal is a quarterly Email newsletter designed to keep subscribers up-to-date on salient news and events at CoSORT/IRI, Inc. Past newsletters are archived for 2 years here.


Copyright © 2005 Innovative Routines International (IRI), Inc.
All rights reserved. CoSORT, SortCL, Permitas, and Rowgen are trademarks of IRI. FACT is a trademark of CoSORT Korea, Ltd. All other names mentioned herein may be copyrights or registered copyrights of their respective owners.

To remove an Email address from future CoSORT Journal mailings, please email news@iri.com. To subscribe, please provide your contact details here. To contact an IRI agent for assistance, call 1-800-333-SORT, or email cosort@iri.com. To find the IRI agent in your country, click here.

 

Question: What's Fast, Cost-Effective ETL?

Ever since Database Trends Magazine declared CoSORT 'The Emerging ETL Engine' in 1999 -- and now amid growing interest in IRI's piped Oracle ETL solution; i.e. fact | sortcl | sqlldr -- the goal of fast but affordable VLDW ETL has driven IRI development.

Leaping Every ETL Hurdle at Once

You can use IRI tools in a variety of ways to make light work of high-volume ETL operations, and uniquely optimize and/or combine these individual elements in a simple, inexpensive way.

If you run Oracle on UNIX, you're especially in luck: 

How IRI's Product Combo is an Oracle ETL Solution: 

CoSORT's FAst extraCT (FACT) product for Oracle rapidly Extracts tables to faster flat-file formats. FACT can also pipe this data to, and create the extract's metadata for, CoSORT's Sort Control Language (SortCL) program and SQL*Loader. SortCL then runs one or more simultaneous (or standalone/batch) Transformation jobs and reports. Multiple filtered, reformatted and CoSORTed output targets include: files, reports, and pipes to Loading tools like Oracle's SQL*Loader. This approach is simple, less expensive, and faster than other ETL tools -- (reusable) metadata build automatically and the jobs run outside the DB!

How IRI Tools Perform & Speed ETL Jobs Generally:
1) CoSORT's FACT performs rapid Extractions.
2) CoSORT's SortCL UIs perform, speed, and
    combine Transformations in 1 pass like:
       Select-Sort-Join-Calc-Convert-Aggregate-Remap
3) CoSORT speeds bulk Loads by pre-sorting.
4) CoSORT's SortCL sets up and runs faster
    than equivalent ETL or PL/SQL procedures.
5) Flat files are the fastest way to stage very
     large volumes of warehouse and ODS data.

If you already use a conventional ETL tool, CoSORT will make its high-volume operations more efficient: 

How IRI Technologies Speed Other ETL Products:
1) Easy metadata interface to ETI*Extract
2) Dual-mode PlugIn for DataStage XE sort
3) Plug'n'Play AEP for Informatica Sorter Tx 
4) Direct drop-in sort replacements for SAS
    Software AG Natural, and top COBOL tools. 

And remember, if you also need to produce reports in or outside of the ETL context, CoSORT's Sort Control Language (SortCL) program is a full-function, custom report generator:

Please click here to link to the details, or contact us with any questions or comments.

Tech Tip

Using SortCL to Rank Data

CoSORT Sort Control Language (sortcl) program supports a  wide range of data aggregation and expression functions. Did you know that the count feature could also be used for ranking?

Suppose you want to rank salespeople by the value of sales they have made, with the highest sales ranking first. The following file contains names of salespeople and their associated sales (in thousand dollar amounts).

   Mary 23
   Robert 129
   John 345
   Vanessa 31
   Donald 345
   Sarah 54
   Laura 45
   Henry 98
   Richard 31
   Tom 29
   Nancy 18
   Barbara 32

The sortcl script below ranks the salespeople in order of sales made. Notice that the sort is in descending order so that highest sales values are returned first. Also, the rank field is declared as NUMERIC so that the values will be right-justified.

/INFILE=salesbyname
   /FIELD=(person,POS=1,SIZE=11)
   /FIELD=(sales,POS=13,SIZE=3)
/SORT
   /KEY=(sales,DESCENDING)
/OUTFILE=sale_rank.out
   /FIELD=(rank,POS=1,SIZE=2.0,NUM)
   /FIELD=(person,POS=4,SIZE=11)
   /FIELD=(sales,POS=15,SIZE=3)
   /COUNT rank RUNNING WHERE sales


That script produces this output:

   1  John       345
   1  Donald     345
   2  Robert     129
   3  Henry      98
   4  Sarah      54
   5  Laura      45
   6  Barbara    32
   7  Richard    31
   7  Vanessa    31
   8  Tom        29
   9  Mary       23
   10 Nancy      18 

Notice how equal performers get equal rankings. For more information on using sortcl for ranking, see the CoSORT 8.2.2 User Manual, or contact an IRI engineer directly via support@iri.com.