MrUll's

Juni 22, 2004

Oracle Parameter NLS_COMP

As mentioned in a former article, sorting sequences of chars and numbers may differ. A binary (ascii based) sort produces a 0-9, A-Z sequence while a linguistic sort (e.g. german) gives A-Z, 0-9. in an order by clause this can be hardcoded directly via the usage of the nlssort function. e.g.
select col1 from ... order by nlssort( col1, 'NLS_SORT=GERMAN') desc

Controlling this can also be done using database parameters or session scoped parameters so that no nlssort function must be applied.

Imagine an application allows browsing forward and backward in an ordered result list. Further, assume this result is presented using a linguistic sort sequence (e.g. GERMAN, see above). With this, it is likely that this result is done with the help of character comparisons inside a where clause.

select shortname, description,id from
(
select t1.shortname, t2.description, t1.id
from t1 natural join t2
where t1.shortname > 'Xeno'
order by nlssort( t1.col1, 'NLS_SORT=GERMAN') asc
)
where rownum < 13


Assume this statement presents the last page after some browsing in a forward direction has been done. Because of the german sort, it is expected that shortnames starting with numbers are presented within the last pages. For this to work the comparison operator has to know whether characters containing numbers are greater then 'Z' or not. This depends on the value of initialazion parameter NLS_COMP. With BINARY as default value '9'<'A' is true.

SQL> -- default NLS_COMP is BINARY
SQL> select * from dual where '9'<'A';

D
-
X

SQL> alter session set NLS_COMP=ANSI;
...
SQL> select * from dual where '9'<'A';

no rows selected

SQL> select * from dual where '9'>'Z';

D
-
X


Within this issue there are some side effects and further considerations.
Index (ordinary B* indexes) storage is always BINARY. i found no way to create an index with an ANSI leaf-storage sequence . With this in mind it is possible to implement a paging of results without using inline views as above. Its possible to use INDEX_ASC or INDEX_DESC optimizer hints and thus to omit an order by clause. This implements an ISAM access using ordinary oriented tables. (for ISAM on index oriented tables (IOT) without an order by clause the column for which ordering and restrictions occur has to be the primary key). This works only if a binary sorting (NLS_SORT==NLS_COMP==BINARY) is wanted.

As shown above, its possible to apply nlssort() for ANSI sorting at the statement level. this produces no side effects. But as shown, results may not be correct. For the comparision operator, this fine grained behaviour cannot be achieved. if the session scoped value of NLS_COMP is set to ANSI all SQL statements (in your session) may be affected with this change. This can lead to severe performance impacts for an application. As an index traversal access path is no longer possible (leaf storage is binary but we are doing an ANSI-sort-like restriction in the where clause) full scans occur. This is getting even more worse when an application is using an connection pooling.

when switching to an ANSI sort, its possible to workaround this at the statement level. For a forward scroll:

...
where ( t1.shortname > 'Xeno' or t1.shortname <= '9' )
order by nlssort( t1.col1, 'NLS_SORT=GERMAN') asc
...

Shortnames characters beginning with numbers are included (because of or ...) and the usage of nlssort puts them to the rear end of the result list. For a backward scroll numbers has to be suppressed:

...
where ( t1.shortname < 'Ceno' and t1.shortname >= 'A' )
order by nlssort( t1.col1, 'NLS_SORT=GERMAN') asc
...


Juni 15, 2004

Workaround for awk record too long problem

logfile processing with awk sometime aborts with an error like:

awk: record ` Exception: oracle....' too long
record number 1.02527e+06
Broken Pipe


Solution: shut down and deinstall oracle .)

Obviously there seems to be an upper limit for the record size (linesize) which awk is able to process. An approach is to use nawk if gawk is not present. nawk has a limitation too, but it allows larger recordsizes. Gnu awk (gawk) does not have this restriction. with gawk it is possible even to process large sized logfiles (about 400M). You can get gawk from http://sunfreeware.com/

Juni 14, 2004

Oracle external organized Table for Tomcat accesslog analysis

This posting shows by example how to query a tomcat access log via SQL. This works only with an Oracle DB. Key feature for this is the "external Table" support. Tabledata is populated with sqlloader internally, this allows nearly every fileformat to be processed. SQL Loader Syntax ist documented in the "utility" manual (Oracle9i Database Utilities). this example works with Version )9i Release 2 (called 9.2 but exactly 9.0.2.x).

A tomcat accesslog has the following lines:
10.19.33.229 - - [18/May/2004:08:28:57 1000] "POST /kat3/servlet/kat3/action/Stammdaten.Artikel.Detail HTTP/1.1" 200 690


you need an oracle account and the rights to create/query a table & directory. the first step is to create an directory object so that an access is possible. Syntax:
create directory ext_dir_tomcatlogs as  '/export/home/tomcat/logs';

specify your filesystem directory where your logfile of interest exists. Next step is to create the table:


create table ext_accesslog
( ip varchar2(30),
c2 varchar2(30),
c3 varchar2(30),
accessdate varchar2(100),
c1000 varchar2(20),
url varchar2(1000),
c6 varchar2(40),
c7 varchar2(50)
)
organization external
(
type oracle_loader
default directory ext_dir_tomcatlogs
access parameters (
fields terminated by WHITESPACE
OPTIONALLY enclosed by '"'
(
ip, c2, c3, accessdate, c1000, url, c6, c7
))
location ('localhost_access_log.2004-05-13.txt')
)
reject limit unlimited
;


Now some analysis can be done e.g.

select * from ext_accesslog where url not like '%.gif HTTP/1.1' order by ip, accessdate 


select ip, min(accessdate), max(accessdate), count(*) from ext_accesslog group by ip


Awk for accesslog analysis e.g. sorting

Here are some awk helpers to ease accesslog analysis. particular in cases where one logfile holds an exception and timestamp and you need to determine which actions lead to this.
Usual tasks are:
  • find out which IPs were active in a specific time-window.
  • which IP Adress caused the exception
  • reordering accesslog lines by IP, Timestamp for having a look on the sequence of requests.

    This first awk skript is to select all requests around a specific point of time:

    cat localhost_access_log.2004-05-27.txt | awk '/\[27\/May\/2004:05:(05|06)/{ print $0 }' > t.log


    the regexp /\[27\/May\/2004:05:(05|06)/ is to select all requests which were requested in minute five and six.

    Note that awk offers associative arrays. Oracle Pl/Sql offers this too with declarations like
    "type t_arr is table of varchar2(20) index by varchar2(54)"
    . With this its possible to use IP Address as index values. See next example to implement "select IP, count(*) from accesslogfile group by IP"

    cat localhost_access_log.2004-05-27.txt | nawk '{ if ($1 in arr) {arr[$1]++} else {arr[$1]=1} } END { for (x in arr) print x " " arr[x] " requests" }'

    For counting in a specific timeframe just use a regexp as in the first script.

    Next script uses a multidimentional array to reorder accresslogentries (lines) by IP and timestamp:

    { if ($1 in arr) {arr[$1]++} else {arr[$1]=1} requests[$1, arr[$1]]=$0 } END { for (ip in arr) { for(x=1;x<=arr[ip];x++) {print requests[ip, x]}}}


    Note that this is getting RAM excessive for large accesslogs.

    if theres a running Oracle instance present, its possible to create a table with external storage (our logfile) to do e.g. ordering.

  • Juni 04, 2004

    Oracle SQL Order by, sequence of numbers and characters

    this posting explains how to control the ordering in which characters and numbers are returned when a order by clause is used. to keep it simple i do not explain how to arrange this without a order by clause and an index organized table. if the order by clause is used without the nlssort function then results may vary from time to time. this is because of some session scoped parameter values affect the sequence of the result. These are NLS_SORT and NLS_LANGUAGE. if NLS_SORT is empty, its value is derived from NLS_LANGUAGE / NLS_LANG.

    Often NLS_SORT defaults to BINARY. This is the case when NLS_LANGUAGE is set to AMERICAN. The binary sort method sorts by byte value. On ASCII based platforms this results in the sequence: numbers,uppercase chars,lowercase chars. on the other side it is possible to set a specific value eg. GERMAN. See example:



    SQL> alter session set NLS_SORT=BINARY;

    SQL> select * from ( select 'r' as dummy from dual union select '9' from dual union select 'R' from dual ) order by 1;

    D
    -
    9
    R
    r

    SQL> alter session set NLS_SORT=GERMAN;

    SQL> select * from ( select 'r' as dummy from dual union select '9' from dual union select 'R' from dual ) order by 1;

    D
    -
    r
    R
    9



    hardcoding is an approach to be independent of NLS parameters. This is done with the help of the nlssort function. See example:



    SQL> select * from ( select 'r' as dummy from dual union select '9' from dual
    union select 'R' from dual ) order by nlssort(dummy,'NLS_SORT = BINARY ');

    D
    -
    9
    R
    r

    SQL> select * from ( select 'r' as dummy from dual union select '9' from dual
    union select 'R' from dual ) order by nlssort(dummy,'NLS_SORT = GERMAN ');

    D
    -
    r
    R
    9



    Note that you have to use the column name within the nlssort function, dont use nlssort(1, ...) as the function uses 1 as a constant and not the first column.

    For most counties the derived NLS_SORT setting produces a "AaBbCc ... 0123456789" sorting sequence by default. but take care when using AMERICAN, Arabic, Icelandic, Polish settings; this results in "0123456789 AaBbCcDd ..."

    to get the relevant NLS parameter values for your DB you can query the view DATABASE_PROPERIES, to get the values in the session scope use NLS_SESSION_PARAMETERS.

    Juni 02, 2004

    About RSS

    I played around with RSS feeds and readers; after several weeks i rate it as quite useful. my favourite rss reader is "rssreader" - its freeware but only supports the window platforms; you need an instalation of the .net framework. See website
    http://www.rssreader.com/

    some feeds (updating frequently)

  • (german) computerwoche online http://www.computerwoche.de/rss/news.xml
  • (german) dpa presseportal http://www.presseportal.de/rss/presseportal.rss2
  • Eclipse User Community http://www.jsurfer.de/backend.php
  • Yahoo! News - Health http://rss.news.yahoo.com/rss/health
  • Yahoo! News - Science http://rss.news.yahoo.com/rss/science
  • Yahoo! News - Technology http://rss.news.yahoo.com/rss/tech
  • Yahoo! News - World News http://rss.news.yahoo.com/rss/world
  • Slashdot News for nerds, stuff that matters http://slashdot.org/slashdot.rdf




  •