Success Story
CoSORT Powers High-Volume ETL for Telco Data Warehouse
(published July 2004, DM Review)
Reviewer
Beno Pradekso, ETL architect for Solusi247's telco engagements.
Background
Solusi247 is a Jakarta-based data warehouse consultancy specializing
in the telecommunications industry. Solusi247 combines expertise
in data integration and staging (as well as enterprise reporting)
with domain knowledge of both fixed and mobile (especially GSM and
CDMA) telecommunications. In Solusi's most recent client engagement,
the central DW and reporting tasks involved extracts from MSC, IN,
MMSC, GSN, billing systems, voucher management systems (prepaid
services), CRM and other network data sources.
Platforms
Sun SF68000 with eight 900 MHz CPUs and 8GB of main memory. There
are also 5TB storage SANs with four fibre channel I/O, three Blade
servers with 3 GHz CPUs and 2GB of memory for binary-ASCII conversions,
and one data collection machine.
Problem Solved
Our telco customer's MSC (switch) produces more than 50 million
call detail records per day. The data must be transformed and loaded
into a central database used by more than 600 customer care staff
throughout Indonesia. The preprocessing and loading is performed
between 1:00 a.m. and 4:00 a.m. - a three-hour window for resolving
references such as type of call, operator code, area code (down
to detail location) and call metrics (minutes and charges). Solusi
uses our own product (SDTRAT) to convert the binary data into ASCII
on the three Blade Servers - this takes 45 minutes. The final load
takes 45 minutes. CoSORT must perform all of the data integration
and staging between conversion and loading. That is, within the
remaining portion of the staging window, CoSORT must filter, sort,
join, aggregate and reformat all 50 million rows. Our customer is
highly satisfied with our solution, particularly because of CoSORT's
speed, fit and scalability.
Product Functionality
We can quickly design and execute the customized transformations
that we need with CoSORT's sort control language (SortCL), a fourth-generation
data definition and manipulation language. We only need to create
our source layouts once, in centralized SortCL meta data repositories,
and then simply refer to our field names in SortCL job scripts when
remapping the input data through process to output (targets). We
benefit from SortCL's ability to rapidly perform and combine filtering,
sorting, merging, aggregating and reformatting in one or just a
few job scripts and in one I/O pass.
Strengths
CoSORT's sort and join speed is incredible, beating every other
database, ETL and sort tool on the market. Because the SortCL syntax
is so logical and explicit, it has been easy for us to add more
simultaneous processing during the client's same production window,
such as grouping data according to type (incoming/outgoing, peak/off-peak,
days/weekends, etc.).
Weaknesses
Our developers found it easier to code the more complex SortCL outer
join scripts using word processors than via CoSORT's Java GUI for
SortCL. Although SortCL coding is straightforward, we would prefer
to use the GUI to automate our meta data creation.
Selection Criteria
Because of the enormous volume of call data moving through the ETL
process, we had to find a solution that would be easy to set up,
scale well (and not impact concurrent apps), and be flexible enough
to meet our growing application requirements. We evaluated and benchmarked
other sort-related and ETL tools, as well as our own 3GL programs,
and found CoSORT to be not only the fastest flat-file manipulator
on the market, but the best programmatically, functionally and financially.
Deliverables
We include CoSORT with our own Web-based portal application, which
includes a warning/monitoring system, scheduling, workflow and reference-data
maintenance. CoSORT's SortCL runs on the command-line, from a Java
GUI, in batch or via program calls. Additionally, SortCL is only
one of several UIs in CoSORT, a suite which also includes: utilities
to convert various third-party data layouts and legacy sorts to
SortCL; an interactive, menu-driven program for sort/merge operations
(which can save the specs for batch runs); and approximately a dozen
native plug-and-play replacements for slower, third-party sorts
such as those within DataStage, Informatica, MF COBOL, SAS, Software
AG Natural and the UNIX O/S.
Vendor Support
CoSORT support has been very helpful and responsive - throughout
the world. The CoSORT engineers also volunteer technical suggestions
to further improve our processing performance. Their support has
really helped us grow our business as application providers and
consultants.
Documentation
CoSORT's manual and online help systems are clear and comprehensive.

|