Large data volumes (i.e. more than one million rows) can be slow to transform, even after consulting and tuning are employed. Particular bottlenecks are large sorts, joins, aggregations, loads, and sometimes unloads. Parallelization or optimization in other layers or tools can be somewhat unwieldy, if not expensive, and may create adverse performance impacts on other users.
Solutions:
1) CoSort Sort Stage Plug-In for DataStage
Speed sorting directly within DataStage Server Edition with CoSort's
unique Sort Stage Plug-In for DataStage. This can improve
sort performance up to 10X with no interface changes. Subsequent
join, aggregation, and load runtimes should also benefit.
2) Fast Transformations alongside DataStage
By running CoSort's Sort Control Language (SortCL) program alongside IBM
WebSphere DataStage Server or Enterprise Edition in the file system, you
can perform fast sorts, joins, and aggregations -- all in the same job script
and I/O pass. While running large data transformation tasks in parallel,
you can also specify file-format and data-type conversions, field-level
encryption and other data privacy functions, custom reports, and pre-sorted load files.
If you still wish to use the aggregation stage in DataStage, CoSort can
help you improve its performance. Add a sequential file stage prior to
the aggregation stage, and run a SortCL script to externally pre-sort
the file on break keys. Then, define the sorted fields in the
aggregation stage.
To facilitate CoSort operations, as well as the creation of realistic test
data, for use with DataStage, Meta Integration Technology's Model Bridge
(MIMB) software can create SortCL and RowGen data definition files from
the flat file layouts you have already defined in .DSX format. This saves
you from having to manually re-write all your input and output file field
layouts, making it easier to run CoSort tools with DataStage!