Speeding up I/O Processing of Flatfiles in SAS I want to thank Paul Dorfman for his insight into peekc processing techniques. Times for converting 1,000,000 x 80 byte records with 10 x 8 byte numeric filelds. I realize not all fields are numeric on any production flatfile. Output SAS table has 1 million rows 10 columns ( x1 -- x10). Create SAS Table from flatfile. =============================== Format CPU Seconds ====== =========== 8. 57.47 PD8. 28.77 PIB8. 14.67 RB8. 12.54 $80 6.93 ( Read as 80 bytes ) RB8. requires the least conversion time, since this is the native format for SAS numeric variables. ================================================================== For very fast flatfile creation consider: The following two programs create 300 records of 27652 bytes Each record has 3456x8 byte double float fields. The output flatfiles are identical. Seconds CPU Elapsed ===== ======= Program A 26.13 27.27 Program B 00.17 1.00 Program A: data _null_ ; array x{3456} _temporary_ ( 3456*(255)); file 'zz.$48$.flt' lrecl=27652; do i = 1 to 300; do j= 1 to dim(x); put x(j) rb8. @; end; put; end; run; NOTE: The file 'am.$48$.flt' is: Dsname=zz.$48$.FLT, Unit=3390,Volume=xxxxxx,Disp=SHR,Blksize=27648, Lrecl=27648,Recfm=FB NOTE: 300 records were written to the file 'zz.$48$.flt'. NOTE: The DATA statement used the following resources: CPU time - 00:00:26.13 Elapsed time - 00:00:27.27 EXCP count - 358 Task memory - 4905K (2568K data, 2337K program) Total memory - 12545K (6944K data, 5601K program) Program B: data _null_ ; array x{3456} _temporary_ ( 3456*(255)); length s $27648; a1 = addr(x{1}); file 'zz.$48$.flt' lrecl=27652; do i = 1 to 300; s = peekc(a1,27648); put s; end; run; NOTE: The file 'am.$48$.flt' is: Dsname=zz.$48$.FLT, Unit=3390,Volume=xxxxxx,Disp=SHR,Blksize=27648, Lrecl=27648,Recfm=FB NOTE: 300 records were written to the file 'zz.$48$.flt'. NOTE: The DATA statement used the following resources: CPU time - 00:00:00.17 Elapsed time - 00:00:01.09 RSM Hiperspace time - 00:00:00.00 EXCP count - 319 Task memory - 2701K (251K data, 2450K program) Total memory - 12545K (6944K data, 5601K program) ============================================================================= ============================================================================= All messages from thread Message 1 in thread From: Paul Dorfman (paul_dorfman@HOTMAIL.COM) Subject: Re: Benchmarks - Flatfile formats for fast SAS processing Newsgroups: comp.soft-sys.sas View this article only Date: 2001-07-29 01:02:32 PST Roger, Most interesting and informative. >The native format for SAS to read and write on a >MVS mainframe is > >$char ( EBCDIC) >rbw.d ( Floating Point - very flexible format ) > ( also the internal format for S390 architecture ) It is the *SAS* native format for *any* architecture. S/390 has no single native architecture - that is, it provides the ability to compute natively in packed-decimal, binary, or float, if the language supports them. PL/I and Cobol support all of the above, SAS - double float only. >These should be used whenever possible. Depending on what is needed to be saved - disk space or CPU time. Storage-wise, it is not always best to store as RB8., for quite large integers can be stored in much fewer bytes as binary. It is hard to argue, though, that for an arbitrary numeric value, RB is the fastest way to store the value '(s)as is' - just as you note below. >Mininmal aditional conversion time (+20%) over RBw.d > >PIBw >IBw.d > >Very slow conversion (zoned decimal / Packed decimal) > >X.x ie 10., 12., 9., 8.3 >PDw.d > > >In addition, storing dates on flatfiles(as number of days since Jan 1, >1960), >in RB4 or PIB3, >will speed up SAS processing. COBOL should be able >to do this because DB2 dates have a similar format??? >Number of days since Jan 1, 1920???? > > >Benchmarks (All on mainframe) > > >Times for converting 1,000,000 x 80 byte records with 10 x 8 byte >numeric filelds. I realize not all feilds are numeric on any >production flatfile. SAS table has 1 million rows 10 columns >x1 -- x10. >Create SAS Table from flatfile. >=============================== >Format CPU Seconds >====== =========== >8. 57.47 > >PD8. 28.77 > >PIB8. 14.67 > >RB8. 12.54 > >$80 6.93 ( Read as 80 bytes ) OTOH, in memory, all x1-x10 are already stored as RB8., so why spend time *converting* them to RB8? Instead, try to benchmark this: data _null_; array nn x1-x10 ; a1 = addr(nn1) ; file 'f:\test.txt' ; do until (end) ; set a end=end ; s = peekc(a1,80) ; put s; end ; run ; The array ensures that the numeric variables involved are consecutive in memory starting with x1 regardless of their logical order in PDV or *any* other specifications. Kind regards, ==================== Paul M. Dorfman Jacksonville, Fl ==================== _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp Message 2 in thread From: Peter Crawford (Peter@CrawfordSoftware.demon.co.uk) Subject: Re: Benchmarks - Flatfile formats for fast SAS processing Newsgroups: comp.soft-sys.sas View this article only Date: 2001-07-29 12:30:02 PST On winXX the comparison is not so dramatic 2 step demo based on Roger's structure:- First put as one block 0.76 seconds real time Second put individual vas as RB8. 1.43 seconds real time using a do loop instead of one large (macro generated) statement 240 data _null_ ; 241 array x{3456} _temporary_ ( 3456*(255)); 242 length s $27648; 243 a1 = addr(x{1}); 244 file 'am.$48$.flt' lrecl=27652; 245 do i = 1 to 300; 246 s = peekc(a1,27648); 247 put s; 248 end; 249 run; NOTE: The file 'am.$48$.flt' is: File Name=C:\Program Files\SAS Institute\SAS\V8\am.$48$.flt, RECFM=V,LRECL=27652 NOTE: 300 records were written to the file 'am.$48$.flt'. The minimum record length was 27648. The maximum record length was 27648. NOTE: DATA statement used: real time 0.76 seconds 250 data _null_ ; 251 array x{3456} _temporary_ ( 3456*(255)); 252 file 'am.$48$.flt' lrecl=27652; 253 do i = 1 to 300; 254 do j= 1 to dim(x); 255 put x(j) rb8. @; 256 end; 257 put; 258 end; 259 run; NOTE: The file 'am.$48$.flt' is: File Name=C:\Program Files\SAS Institute\SAS\V8\am.$48$.flt, RECFM=V,LRECL=27652 NOTE: 300 records were written to the file 'am.$48$.flt'. The minimum record length was 27648. The maximum record length was 27648. NOTE: DATA statement used: real time 1.43 seconds Xlr82sas writes >Paul, > >Thanks for your insights. > > I think you are onto something. > > I did not see a big difference when > writing 10 8byte (rb8) fields using > your peekc. However, I noticed a > phenomenal efficiency when writing > a half track of x1-x3458 rb8 fields > using peekc and a 27648 character > variable. Is there somethig wrong with >my code! I examined and even read back >the flatfile, all seems ok. I expect Poking >the flatfile back to a temp array will be very fast. > >Writing 300 records of one character variable containing x1-x3456 rb8 floating >point vars >CPU 0.16 Seconds Elapsed 0.9 seconds > >Yes 0.16 CPU seconds. > >Writing 300 records using put x{1} x{2} .. x{3456} >CPU 21.99 Elapsed 5.3 > >Yes 21.99 CPU seconds > >It would be nice if SAS had a quick way to dump the PDV to a flatfile. > >Note the put statement with macro in second, slow program, It is my >understanding that temporary arrays have to be fully specified in a put >statement. > > > 00002 data _null_; | > | 00003 length s $27648; | > | 00004 array x{3456} _temporary_ ( 3456*(255)); | > | 00005 a1 = addr(x{1}); | > | 00006 file 'am.$48$.flt'; > | 00007 do i = 1 to 300; | > | 00008 s = peekc(a1,27648); | > | 00009 put s; > | 00010 end; > | 00011 run ; > > > > %macro doem; | > | 00014 data _null_; | > | 00015 array x{3456} _temporary_ ( 3456*(255)); | > | 00016 file 'am.$48$.flt'; | > | 00017 do i=1 to 300; | > | 00018 put ( | > | 00019 %do i=1 %to 3456; x{&i} %end; | > | 00020 ) ( 3456* rb8. ); | > | 00021 end; | > | 00022 run ; | > | 00023 %mend doem; | > | 00024 %doem; > >Roger J DeAngelis >CompuCraft Inc >XLR82SAS@aol.com ( Accelerate to SAS ) >http://members.aol.com/xlr82sas/utl.html -- Peter Crawford ©2001 Google