Q4'2007 Newsletter from The CoSort Company
Innovative Routines International, Inc. (IRI)

Version 2 of the CoSort Test Data Tool, RowGen, will be released shortly. RowGen creates safe, relational tables, realistic test files, and custom-formatted reports. With RowGen, you can test database and ETL operations, stress applications, benchmark different platforms, and outsource your file and report formats without exposing confidential information. Consider these:

Uses of Test Data

Database Population
By populating tables with realistic data, you can test complex update, load, and query scenarios, and build ETL operations in a safe area. The latest version of RowGen leverages your data models and table relationships to synthesize massive, referentially correct production databases quickly and safely. With RowGen, you can simulate an entire enterprise data warehouse.

Application Development
Accurate test data allows you to more effectively develop and stress test your programs. By using test data with real value ranges and volumes, your applications are more likely to work in production. RowGen’s explicit scripting syntax and high-speed data generation engine accelerate the creation of correct test data. This can eliminate key development-stage dependencies, and improve the quality of your applications — increasing customer satisfaction and decreasing support costs.

Privacy Compliance
Populating applications and databases with safe test data eliminates the risks of testing with production data. RowGen helps you comply with industry and corporate data security regulations.

Format Sharing
By building test data directly into your desired file or report layouts, you outsource real formats and not private content. RowGen can synthesize and manipulate test data in any format, value range, and volume that you define for structured data sets or reports, so you can represent reality while eliminating data privacy concerns.

Benchmarking
You may need to build many large, representative files at once. RowGen can rapidly generate a scaling test suite where you can run a complete and consistent battery of tests against different software and hardware configurations.


What is RA-RowGen?
And How Does this GUI Work? 

RowGen v2 leverages the power of three leading-edge tools:

RapidACE LLC and IRI have partnered to deliver the RapidACE Graphical User Interface for RowGen. The RA-RowGen GUI helps you automatically create, modify, and run the RowGen control language (.RCL) job scripts necessary to build referentially correct test data for:

The GUI allows you to create drag-and-drop categories with generic settings to handle multiple classes of tables from a DDL, and then apply the script creation engine to the category. RA-RowGen also contains a scripting wizard to create test data in custom flat file and report structures. The image below shows both aspects of the RA-RowGen GUI:

RA-RowGen allows you to specify the parameters of the test data you need to generate using a point-and-click interface. Each .RCL file can be further customized using the embedded text editor, giving you full control over your test data generation scripts in the familiar language of RowGen (see the job samples that follow).


Need Referentially Correct Test Data?
How RowGen v2 Creates Test Data with Referential Integrity

In addition to reflecting the table layouts defined in your data models, your test tables must be able to reflect the business rules represented in the one-to-many and many-to-many relationships of the RDBMS. The RA-RowGen GUI applies the constructs of your data models to preserve referential integrity the test data. RA-RowGen automates the creation of RowGen Control Language (.RCL) job scripts that build the SET and data files according to DDL primary and foreign key constraints:

 

These auto-generated scripts will output tab-delimited test table and SET files where the tables share dependent SET file values and data types. RA-RowGen will also build and execute the RowGen job scripts in the order that maintains referential integrity across all your test tables. Specifically, RA-RowGen ensures that dependent SETs are created before the test tables. Once RowGen has produced the (pre-sorted) output files, specify a direct path load in your database's load utility. This is the fastest way to populate high-volume test tables.


Build XML Test Files
Plus CSV, COBOL, Others -- Simultaneously!

RowGen allows you to produce multiple test files in standard formats such as CSV, LDIF, ISAM, and XML. By default, all targets produced by RowGen are consistent with the text file format equivalent to RS (record sequential) in COBOL. Test files consist of either variable- or fixed-length records. However, in the output section of a RowGen job script, you can specify any of the other supported file types.

RowGen can also produce output fields with many different data types. Conversion from one data type to another is also possible, and useful if you are producing multiple output files with different data type requirements.  Following is an example of a RowGen job creating an XML test file:

# Test Data Generation Phase
/INFILE=generate
  /FIELD=(Item, POSITION=1, SIZE=10, SEPARATOR="|", UPPER)
/FIELD=(CustNum, POS=2, SIZE=4.0, SET={[1000,9999]}, SEP="|", NUMERIC)
  /FIELD=(Amount, POS=3, SIZE=6.2, SET={[200,900]}, SEP="|", NUMERIC)
# Action Phase
/REPORT # do not sort
# Test Data Production (Formatting) Phase
/OUTFILE=test.xml
  /PROCESS=XML
  /FIELD=(SEQUENCER, POS=1, SEP='|',xdef="/Sales/Sale@Num")
  /FIELD=(Item, POS=1, SIZE=10, SEP="|", UPPER)
  /FIELD=(CustNum, POS=2, SIZE=4.0, SEP="|", NUMERIC) /FIELD=(Amount, POS=3, SIZE=6.2, SEP="|", NUMERIC)

 

test.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<Sales>

<Sale Num="1">
  <Item>TBEHHMWBZQ</Item>
  <CustNum>5672</CustNum>
  <Amount>364.76</Amount>
</Sale>

<Sale Num="2">
  <Item>SICQQYZSFV</Item>
  <CustNum>5819</CustNum>
  <Amount>848.69</Amount>
</Sale>

</Sales>


Share Formats, Validate Apps
RowGen Creates Custom Test Reports

RowGen v2 generates test data for databases and applications with output files in fixed and delimited formats such as XML, CSV, LDIF, etc. Beyond that, RowGen includes the data transformation and reporting functions of CoSort’s Sort Control Language (SortCL) engine. This means that RowGen can create, segment, and customize test data for multiple business targets at once. Each target can have its own layout and file format (even HTML), including the detail and summary formats common to 2D reports or dimensional OLAP cubes. Test report designers can even create derived fields based on field conditions, functions, calculations, and/or aggregations. Why is this important? Because you may need to outsource real report formats but not real data, or because you need to validate your applications using test data. Consider this job sample:

# Test Data Generation Phase
/INFILE=trans.in   # placeholder only; no real input recognized
    /INCOLLECT=6    # number of rows to generate
   /FIELD=(trans_type,POS=1,SIZE=1,set = {A,B,C})
   /FIELD=(acct_no,POS=3,SIZE=5,set=acct.set)
   /FIELD=(amount,POS=9,SIZE=6.2,NUMERIC)

# Action Phase
/SORT
   /KEY=acct_no

# Test Data Production (Formatting) Phase
/OUTFILE=transactions.out             # acct_no sub-totals layout
   /RECSPERPAGE=1
   /HEADREC="---------------------\n"
   /FOOTREC="\n"
   /DATA="Acct:"
   /FIELD=(acct_no,POS=7,SIZE=5)
   /FIELD=(acct_total,POS=15,SIZE=7.2,NUMERIC)
   /SUM acct_total FROM amount BREAK acct_no

/OUTFILE=transactions.out             # amount grand total layout
   /HEADREC="=====================\n"
   /FIELD=(acct_total,POS=15,SIZE=7.2,NUMERIC)
   /SUM acct_total FROM amount

/OUTFILE=transactions.out             # detail records layout
   /HEADREC="Type           Amount\n\n"
   /FIELD=(trans_type,POS=1,SIZE=1)
   /FIELD=(acct_no,POS=7,SIZE=5,DIGIT)   
   /FIELD=(amount, POS=16,SIZE=6.2,NUMERIC)

transactions.out

Type           Amount
A     27384    721.16
C     27384    -62.11
---------------------
Acct: 27384    659.05
B     96023    294.35
A     96023    618.45
A     96023     -1.69
---------------------
Acct: 96023    911.11
B     98341    634.73
---------------------
Acct: 98341    634.73
=====================
              2204.89

The above script creates a summary report with test data you can share. You can also use RowGen to verify your own application’s ability to aggregate the detailed test data that RowGen creates. By directing the detail records to one file, acct_no totals to another, and grand totals to yet another, you could easily compare RowGen’s results with your application’s sub and/or grand total values.


Need to Master Your Data Models?
RapidACE Integrates and Visualizes Entire EDWs

IRI business partner RapidACE has developed a data model Rapid Architectural Consolidation Engine. The RapidACE Professional Edition visualizes, integrates, and leverages disparate data models across the enterprise. This ground-breaking new tool for data architects creates a single, compliant enterprise-wide data model in less than one day, while improving the predictability and adaptability of your data warehouse. Think of RapidACE as an ETL tool for data models, as you watch generate new 3D models based on changes in source systems. New business requirements ripple through the tool quickly, allowing for unprecedented response times for prototyping, what-if analyses, model analyses, project audit and review, data integration project scope assessments, and feasibility analyses.

Following is a center console view of RapidACE showing DDL editing:

Below is an example of RapidACE’s 3D model of semantic cluster analysis:

RapidACE gives you an immediate view of table dependencies throughout the data warehouse, weighting and distancing their relational importance to the model, and each other, using semantic ontologies and self-improving artificial intelligence.


Try What's New
If you are interested in creating test data from data models and metadata through RowGen, or getting a better handle on your disparate data models through RapidACE, Email rowgen@iri.com. We will be happy to arrange a webinar to answer your questions and provide a free trial.

Copyright 2007, Innovative Routines International (IRI), Inc., The CoSort Company