miércoles, 22 de julio de 2009

SQL Loader

SQL*Loader (sqlldr ) is the utility to use for high performance data loads. The data can be loaded from any text file and inserted into the database.

The Control File

The SQL*Loader control file contains information that describes how the data will be loaded. It contains the table name, column datatypes, field delimiters, etc. It simply provides the guts for all SQL*Loader processing.


Manually creating control files is an error-prone process. The following SQL script (controlfile.sql) can be used to generate an accurate control file for a given table.

SCRIPT

set echo off ver off feed off pages 0
accept tname prompt 'Enter Name of Table: '
accept dformat prompt 'Enter Format to Use for Date Columns: '
spool &tname..ctl

select 'LOAD DATA'|| chr (10) ||
'INFILE ''' || lower (table_name) || '.dat''' || chr (10) ||
'INTO TABLE '|| table_name || chr (10)||
'FIELDS TERMINATED BY '','''||chr (10)||
'TRAILING NULLCOLS' || chr (10) || '('
from user_tables
where table_name = upper ('&tname');

select decode (rownum, 1, ' ', ' , ') ||
rpad (column_name, 33, ' ') ||
decode (data_type,
'VARCHAR2', 'CHAR NULLIF ('||column_name||'=BLANKS)',
'FLOAT', 'DECIMAL EXTERNAL NULLIF('||column_name||'=BLANKS)',
'NUMBER', decode (data_precision, 0,
'INTEGER EXTERNAL NULLIF ('||column_name||
'=BLANKS)', decode (data_scale, 0,
'INTEGER EXTERNAL NULLIF ('||
column_name||'=BLANKS)',
'DECIMAL EXTERNAL NULLIF ('||
column_name||'=BLANKS)')),
'DATE', 'DATE "&dformat" NULLIF ('||column_name||'=BLANKS)', null)
from user_tab_columns
where table_name = upper ('&tname')
order by column_id;



select ')'
from dual;
spool off


EJECUTAR

host sqlldr scott/tiger@db control=upload.ctl

Benefits of Using Multiple INTO TABLE Clauses
Multiple INTO TABLE clauses enable you to:

Load data into different tables

Extract multiple logical records from a single input record

Distinguish different input record formats

Distinguish different input row object subtypes

In the first case, it is common for the INTO TABLE clauses to refer to the same table. This section illustrates the different ways to use multiple INTO TABLE clauses and shows you how to use the POSITION parameter.

Note:

A key point when using multiple INTO TABLE clauses is that field scanning continues from where it left off when a new INTO TABLE clause is processed. The remainder of this section details important ways to make use of that behavior. It also describes alternative ways of using fixed field locations or the POSITION parameter.
Extracting Multiple Logical Records
Some data storage and transfer media have fixed-length physical records. When the data records are short, more than one can be stored in a single, physical record to use the storage space efficiently.

In this example, SQL*Loader treats a single physical record in the input file as two logical records and uses two INTO TABLE clauses to load the data into the emp table. For example, assume the data is as follows:

1119 Smith 1120 Yvonne
1121 Albert 1130 Thomas

The following control file extracts the logical records:

INTO TABLE emp
(empno POSITION(1:4) INTEGER EXTERNAL,
ename POSITION(6:15) CHAR)
INTO TABLE emp
(empno POSITION(17:20) INTEGER EXTERNAL,
ename POSITION(21:30) CHAR)
Relative Positioning Based on Delimiters
The same record could be loaded with a different specification. The following control file uses relative positioning instead of fixed positioning. It specifies that each field is delimited by a single blank (" ") or with an undetermined number of blanks and tabs (WHITESPACE):

INTO TABLE emp
(empno INTEGER EXTERNAL TERMINATED BY " ",
ename CHAR TERMINATED BY WHITESPACE)
INTO TABLE emp
(empno INTEGER EXTERNAL TERMINATED BY " ",
ename CHAR) TERMINATED BY WHITESPACE)

The important point in this example is that the second empno field is found immediately after the first ename, although it is in a separate INTO TABLE clause. Field scanning does not start over from the beginning of the record for a new INTO TABLE clause. Instead, scanning continues where it left off.

To force record scanning to start in a specific location, you use the POSITION parameter. That mechanism is described in Distinguishing Different Input Record Formats and in Loading Data into Multiple Tables.

Distinguishing Different Input Record Formats
A single datafile might contain records in a variety of formats. Consider the following data, in which emp and dept records are intermixed:

1 50 Manufacturing — DEPT record
2 1119 Smith 50 — EMP record
2 1120 Snyder 50
1 60 Shipping
2 1121 Stevens 60

A record ID field distinguishes between the two formats. Department records have a 1 in the first column, while employee records have a 2. The following control file uses exact positioning to load this data:

INTO TABLE dept
WHEN recid = 1
(recid FILLER POSITION(1:1) INTEGER EXTERNAL,
deptno POSITION(3:4) INTEGER EXTERNAL,
dname POSITION(8:21) CHAR)
INTO TABLE emp
WHEN recid <> 1
(recid FILLER POSITION(1:1) INTEGER EXTERNAL,
empno POSITION(3:6) INTEGER EXTERNAL,
ename POSITION(8:17) CHAR,
deptno POSITION(19:20) INTEGER EXTERNAL)
Relative Positioning Based on the POSITION Parameter
The records in the previous example could also be loaded as delimited data. In this case, however, it is necessary to use the POSITION parameter. The following control file could be used:

INTO TABLE dept
WHEN recid = 1
(recid FILLER INTEGER EXTERNAL TERMINATED BY WHITESPACE,
deptno INTEGER EXTERNAL TERMINATED BY WHITESPACE,
dname CHAR TERMINATED BY WHITESPACE)
INTO TABLE emp
WHEN recid <> 1
(recid FILLER POSITION(1) INTEGER EXTERNAL TERMINATED BY ' ',
empno INTEGER EXTERNAL TERMINATED BY ' '
ename CHAR TERMINATED BY WHITESPACE,
deptno INTEGER EXTERNAL TERMINATED BY ' ')

The POSITION parameter in the second INTO TABLE clause is necessary to load this data correctly. It causes field scanning to start over at column 1 when checking for data that matches the second format. Without it, SQL*Loader would look for the recid field after dname.

Distinguishing Different Input Row Object Subtypes
A single datafile may contain records made up of row objects inherited from the same base row object type. For example, consider the following simple object type and object table definitions, in which a nonfinal base object type is defined along with two object subtypes that inherit their row objects from the base type:

CREATE TYPE person_t AS OBJECT
(name VARCHAR2(30),
age NUMBER(3)) not final;

CREATE TYPE employee_t UNDER person_t
(empid NUMBER(5),
deptno NUMBER(4),
dept VARCHAR2(30)) not final;

CREATE TYPE student_t UNDER person_t
(stdid NUMBER(5),
major VARCHAR2(20)) not final;

CREATE TABLE persons OF person_t;

The following input datafile contains a mixture of these row objects subtypes. A type ID field distinguishes between the three subtypes. person_t objects have a P in the first column, employee_t objects have an E, and student_t objects have an S.

P,James,31,
P,Thomas,22,
E,Pat,38,93645,1122,Engineering,
P,Bill,19,
P,Scott,55,
S,Judy,45,27316,English,
S,Karen,34,80356,History,
E,Karen,61,90056,1323,Manufacturing,
S,Pat,29,98625,Spanish,
S,Cody,22,99743,Math,
P,Ted,43,
E,Judy,44,87616,1544,Accounting,
E,Bob,50,63421,1314,Shipping,
S,Bob,32,67420,Psychology,
E,Cody,33,25143,1002,Human Resources,

The following control file uses relative positioning based on the POSITION parameter to load this data. Note the use of the TREAT AS clause with a specific object type name. This informs SQL*Loader that all input row objects for the object table will conform to the definition of the named object type.

Note:

Multiple subtypes cannot be loaded with the same INTO TABLE statement. Instead, you must use multiple INTO TABLE statements and have each one load a different subtype.
INTO TABLE persons
REPLACE
WHEN typid = 'P' TREAT AS person_t
FIELDS TERMINATED BY ","
(typid FILLER POSITION(1) CHAR,
name CHAR,
age CHAR)

INTO TABLE persons
REPLACE
WHEN typid = 'E' TREAT AS employee_t
FIELDS TERMINATED BY ","
(typid FILLER POSITION(1) CHAR,
name CHAR,
age CHAR,
empid CHAR,
deptno CHAR,
dept CHAR)

INTO TABLE persons
REPLACE
WHEN typid = 'S' TREAT AS student_t
FIELDS TERMINATED BY ","
(typid FILLER POSITION(1) CHAR,
name CHAR,
age CHAR,
stdid CHAR,
major CHAR)
See Also:

Loading Column Objects for more information about loading object types
Loading Data into Multiple Tables
By using the POSITION parameter with multiple INTO TABLE clauses, data from a single record can be loaded into multiple normalized tables. See case study 5, Loading Data into Multiple Tables, for an example. (See SQL*Loader Case Studies for information about how to access case studies.).

Summary
Multiple INTO TABLE clauses allow you to extract multiple logical records from a single input record and recognize different record formats in the same file.

For delimited data, proper use of the POSITION parameter is essential for achieving the expected results.

When the POSITION parameter is not used, multiple INTO TABLE clauses process different parts of the same (delimited data) input record, allowing multiple tables to be loaded from one record. When the POSITION parameter is used, multiple INTO TABLE clauses can process the same record in different ways, allowing multiple formats to be recognized in one input file.

Bind Arrays and Conventional Path Loads
SQL*Loader uses the SQL array-interface option to transfer data to the database. Multiple rows are read at one time and stored in the bind array. When SQL*Loader sends the Oracle database an INSERT command, the entire array is inserted at one time. After the rows in the bind array are inserted, a COMMIT statement is issued.

The determination of bind array size pertains to SQL*Loader's conventional path option. It does not apply to the direct path load method because a direct path load uses the direct path API, rather than Oracle's SQL interface.

See Also:

Oracle Call Interface Programmer's Guide for more information about the concepts of direct path loading
Size Requirements for Bind Arrays
The bind array must be large enough to contain a single row. If the maximum row length exceeds the size of the bind array, as specified by the BINDSIZE parameter, SQL*Loader generates an error. Otherwise, the bind array contains as many rows as can fit within it, up to the limit set by the value of the ROWS parameter.

The BINDSIZE and ROWS parameters are described in Command-Line Parameters.

Although the entire bind array need not be in contiguous memory, the buffer for each field in the bind array must occupy contiguous memory. If the operating system cannot supply enough contiguous memory to store a field, SQL*Loader generates an error.

Performance Implications of Bind Arrays
Large bind arrays minimize the number of calls to the Oracle database and maximize performance. In general, you gain large improvements in performance with each increase in the bind array size up to 100 rows. Increasing the bind array size to be greater than 100 rows generally delivers more modest improvements in performance. The size (in bytes) of 100 rows is typically a good value to use.

In general, any reasonably large size permits SQL*Loader to operate effectively. It is not usually necessary to perform the detailed calculations described in this section. Read this section when you need maximum performance or an explanation of memory usage.

Specifying Number of Rows Versus Size of Bind Array
When you specify a bind array size using the command-line parameter BINDSIZE (see BINDSIZE (maximum size)) or the OPTIONS clause in the control file (see OPTIONS Clause), you impose an upper limit on the bind array. The bind array never exceeds that maximum.

As part of its initialization, SQL*Loader determines the size in bytes required to load a single row. If that size is too large to fit within the specified maximum, the load terminates with an error.

SQL*Loader then multiplies that size by the number of rows for the load, whether that value was specified with the command-line parameter ROWS (see ROWS (rows per commit)) or the OPTIONS clause in the control file (see OPTIONS Clause).

If that size fits within the bind array maximum, the load continues—SQL*Loader does not try to expand the number of rows to reach the maximum bind array size. If the number of rows and the maximum bind array size are both specified, SQL*Loader always uses the smaller value for the bind array.

If the maximum bind array size is too small to accommodate the initial number of rows, SQL*Loader uses a smaller number of rows that fits within the maximum.

Calculations to Determine Bind Array Size
The bind array's size is equivalent to the number of rows it contains times the maximum length of each row. The maximum length of a row is equal to the sum of the maximum field lengths, plus overhead, as follows:

bind array size =
(number of rows) * ( SUM(fixed field lengths)
+ SUM(maximum varying field lengths)
+ ( (number of varying length fields)
* (size of length indicator) )
)

Many fields do not vary in size. These fixed-length fields are the same for each loaded row. For these fields, the maximum length of the field is the field size, in bytes, as described in SQL*Loader Datatypes. There is no overhead for these fields.

The fields that can vary in size from row to row are:

CHAR

DATE

INTERVAL DAY TO SECOND

INTERVAL DAY TO YEAR

LONG VARRAW

numeric EXTERNAL

TIME

TIMESTAMP

TIME WITH TIME ZONE

TIMESTAMP WITH TIME ZONE

VARCHAR

VARCHARC

VARGRAPHIC

VARRAW

VARRAWC

The maximum length of these datatypes is described in SQL*Loader Datatypes. The maximum lengths describe the number of bytes that the fields can occupy in the input data record. That length also describes the amount of storage that each field occupies in the bind array, but the bind array includes additional overhead for fields that can vary in size.

When the character datatypes (CHAR, DATE, and numeric EXTERNAL) are specified with delimiters, any lengths specified for these fields are maximum lengths. When specified without delimiters, the size in the record is fixed, but the size of the inserted field may still vary, due to whitespace trimming. So internally, these datatypes are always treated as varying-length fields—even when they are fixed-length fields.

A length indicator is included for each of these fields in the bind array. The space reserved for the field in the bind array is large enough to hold the longest possible value of the field. The length indicator gives the actual length of the field for each row.

Note:

In conventional path loads, LOBFILEs are not included when allocating the size of a bind array.
Determining the Size of the Length Indicator
On most systems, the size of the length indicator is 2 bytes. On a few systems, it is 3 bytes. To determine its size, use the following control file:

OPTIONS (ROWS=1)
LOAD DATA
INFILE *
APPEND
INTO TABLE DEPT
(deptno POSITION(1:1) CHAR(1))
BEGINDATA
a

This control file loads a 1-byte CHAR using a 1-row bind array. In this example, no data is actually loaded because a conversion error occurs when the character a is loaded into a numeric column (deptno). The bind array size shown in the log file, minus one (the length of the character field) is the value of the length indicator.

Note:

A similar technique can determine bind array size without doing any calculations. Run your control file without any data and with ROWS=1 to determine the memory requirements for a single row of data. Multiply by the number of rows you want in the bind array to determine the bind array size.
Calculating the Size of Field Buffers