I now work with a mainframe. I realised once more that numeric variables have different formats on a mainframe. As we do not always have a clear description, we should somehow recognise how the data are stored. Let us first show how numeric data can be stored.
As an example, I take the number 123.

In the upper row, we see the number 123. This is how we see externally this number. In the two rows below, we see the internal format: F1F2F3. This is the most common format: PIC(n) display in Cobol terminology. When we use SAS, we must use the s370fzdu6.2 format, the so-called unsigned zoned decimal.
This brought me the idea to show 6 different internal formats of 123, along with the Cobol format and the SAS format.
Let me give you the results:
How does 123 look Likely Cobol representation SAS format
internally? externally
F0F0F0F1F2F3 000123 PIC 9(n) DISPLAY s370fzdu6.2
C0F0F0F1F2F3 {00123 PIC S9(n) DISPLAY SIGN LEADING S370FZDL6.2
4EF0F0F1F2F3 +00123 PIC S9(n) DISPLAY SIGN LEADING SEPARATE S370FZDS6.2
F0F0F1F2F34E 00123+ PIC S9(n) DISPLAY SIGN TRAILING SEPARATE S370FZDT6.2
00000000123C pic S(9) comp-3 s370fpd6.2
00000000123F PIC 9(n) PACKED-DECIMAL s370fpdu6.2
In fact, two different formats are used:
I recently encountered a situation whereby surrogate keys were used as primary keys. It looks as a reasonable choice; since the so-called SCD2 mechanism was also used to store historic information. The mechanism then works as follows: if we get updated information on a record that is already in the table, we insert a new record with the new information. As a consequence a new surrogate key is issued that acts to identify the new record. This implies that an object that regularly needs to be updated also repeatedly has new surrogate keys that regularly needs to be inserted. Each value of a surrogate key corresponds to a new record that was inserted as a result from new information on the object.
We know that situation from the dimensional models that Kimball advocates. The dimensions are regularly updated and (hence) objects regularly are attached to new values of the surrogate keys that are used as a primary key. If new facts are added, these new surrogate keys will be used. See the scheme below:

Hence, in the Kimball situation, the usage of surrogate keys will not lead to major problems. Each time an update on the dimension is received, a new record will be inserted. A new surrogate key will then be used. If new facts are received, the surrogate keys are looked up from the dimensions and they are used as a foreign key to link the facts to dimensions.
So far, so good.
However, the dimensions may also be linked to other tables. The linkage may be maintained with these surrogate keys. The organisation asked to update these surrogate keys that are used as foreign keys to display the most recent situation.
This leads to a trickling down effect of a change as a consequenc from using surrogate keys.

See the scheme above.
1: A change on a first table arrives. As a consequence, a new record is created that captures the change. As a surrogate key is used as a primary key, a new value for the surrogate key is retrieved.
2: The surrogate key value is also used in a second table as a foreign key. As the organisation asks to update such information, a new record must be inserted that captures the new foreign key value. New record implies a new surrogate key.
3: This induces another round of updates. The new surrogate key may on its turn be used in yet a third table. New record in the third table. New surrogate key.
4: Etc.
What happened is that a complicated and interrelated structure does not sustain a surrogate key structure that is volatile with respect with the underlying objects. If each new representation of the object leads to a new primary key, we are faced with a continous task of updating the foreign keys. A change in one row of a fundamental table may trickle down to a series of updates of foreign keys elsewhere. It not only leads to updates of foreign keys that refer to the product table. It also leads to updates of foreign keys that point to tables that point to product tables etc.
Consequence. If we have an interrelated model, we should maintain a foreign key that does not change its value with respect to the underlying object. If we use a SCD2 mechanism , we should use a primary key that is composed of a stable surrogate key plus the date at which the change is implemented. This will uniquely identify the record. It also avoids the constant update of foreign key as we may issue the stable surrogate key.
About hundred years ago, I worked on a mainframe. I those days, the mainframe was basically the mainstream computer that many people used. In those days, financial institutions started automating their business processes.
In the nowadays world, these mainframes are still the workhorses of these financial institutions. But how did we access them?
Now, I am back in a finncial institute where getting access to such an environment is very handy as the real work is done there. The PC is only used for Email purposes and minor Word editing. Unix is only used in the marketing department but the real stuff is in a mainframe.
So I had to refresh my mainframe knowledge.
But how?
I first got a free access to a mainframe. I found a possibility with http://www.efglobe.com/cgi-bin/mainframe/mainuser .
.
Within two days a clear message was given – I asked access Friday night and got replay Sunday afternoon. Good work. The message gave a clear indication on how to access the mainframe.
But to access a mainframe, you need a terminal application. This can be downloaded from http://x3270.bgp.nu/download.html . This is an open source application that emulates a 3270 terminal.
I recovered the knowledge with an old book Doug Lowe, “MVS TSO”. These three items (a book, a client terminal app and a TSO account) will help to renew my mainframe knowledge. Back in time.
This weekend had bad weather. Rain pouring from the sky and a drop of snow.
Such weather is good to undertake something that is absolutely useless like writing a Perl programme on Linux to access a remote Oracle database.
The first hurdle to overcome is to install an Oracle client along with some Perl routines that will allow us to write Perl code to get things done. This hurdle can be overcome with help of http://usestrict.net/2009/07/12/perl-installing-dbdoracle-on-ubuntu-904-and-oracle-instant-client/ . This site gives a nice cookbook to install both the Oracle client and the Perl modules.
It took me about 4 hours to get this done. But then I had the software that I needed. The actual code that I wrote is:
#!/usr/bin/perl -wT
use Oraperl;
print "Content-type: text/html\r\n\r\n";
print "<!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN>";
print "<html><head>";
print "<title>Test Perl Script</title>";
print "</head>";
$ENV{"ORACLE_HOME"} = "/usr/lib/oracle/xe/app/oracle/product/10.2.0/server";
$hConnection = &ora_login("xe", "uid", "psw") || die $ora_errstr;
$hQuery = &ora_open($hConnection, "SELECT naam, nummer from scott.klant") || die $ora_errstr;
@asQueryReturn = &ora_fetch($hQuery);
while (@asQueryReturn != 0)
{
print "<p>naam: $asQueryReturn[0] nummer: $asQueryReturn[1]";
@asQueryReturn = &ora_fetch($hQuery);
}
&ora_close($hQuery);
&ora_logoff($hConnection);
exit;
This code is not very well documented on the Internet. The best resource is http://www.tutorialspoint.com/perl/oraperl_manual.htm . Nevertheless I could not get this code right until I saw in another source (http://www.cs.purdue.edu/homes/cs290w/perlLecs/PerlOracle.html ) that you also give the full path to ORACLE_HOME as an environmental variable. This is a bit strange as this variable is also known on Linux, but without it, the code did not run.
I wrote this piece of code as a cgi programme. To retrieve the results, I had to start the output with: print “Content-type: text/html\r\n\r\n”; and print “<!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN>”; as to avoid the strings to get lost in the Internet. Only after these strings, the output is interpreted as HTML and it can be displayed in the browser.
Sorry to say, but I love Microsoft Visio. Plse do not spread the word. But from time to time, I had to create a document that served two distinct groups: a group who was just interested in getting a highlight and a group who wanted to go into details. In earlier times, I created two documents: one for the highlights and one for the details. But I then had to maintain two documents. A change in the highlights has to be introduced in the details as well.
This can be shown in next figure:

The highlights are included in the first layer only; the details are composed of the first and second layer. If you maintain this in two documents, you will have to introduce a change in the first layer in two documents simultaneously.
However, this is not necessary as Viso knows the concepts of layers. One could add some objects to the first layer and some to the second. Subsequently, one may choose to print either the first or the second layers or both. This can be directly started when several layers are created.
Take view>Layer Properties

Once the layers are created, one could add the objects to the layer. Take format>Layer:

This could make life easier.
I owe thanks here to gert-Jan Kooren who pointed me at the Vision feature: it allows us to create complicate diagrams, that can also be shown at a lower level of detail. Nice!
For some reason, I always overlooked the possibility to include pictures in a database. I decided to overcome this flaw and started experimenting with it.
The first step was to create a table in Access. In Access, we have the possibility to use the datatype “OLE Object”. A field with this datatype can be used to store pictures. The inclusion of a picture is then straightforward. Rightclick on the field brings you in a menu that allows you to include a picture:

That was really easy.
Let us move to Oracle.
In Oracle, it is really easy sailing. Let us first create a table.
CREATE TABLE "SCOTT"."BLOBJE" ( "NUMMER" NUMBER, "PLAATJE" BFILE )
Then, we create a storage where to put the images in.
CREATE OR REPLACE DIRECTORY PLAATJE_DIR AS '/home/tom/oracle';
This is then followed by storing a nice picture in the storagearea (=/home/tom/oracle). Let us say, we stored a file “thumbsup.gif” there.
The final step is to store a record in the table that was created first:
INSERT INTO blobje VALUES(4,bfilename('PLAATJE_DIR','thumbsup.gif'));
We can then see the picture from within an Oracle table:

Then with Teradata, it was not so simple. Like in Oracle, I created a table first:
CREATE SET TABLE financial.blobje
(
plaatje BLOB(2097088000),
nummer INTEGER NOT NULL)
UNIQUE PRIMARY INDEX ( nummer );
I then changed the mode of Teradata SQL Assistant by listening to import. This was accomplished by File>Import Data. I then started a query, saying:
insert into financial.blobje(nummer,plaatje) values (3,?B);
.
This started a dialogue where a reference was made to a control file that contained the name of the picture. The controlfile was stored in the same directory as the picture and it only contained the name of the picture.
This is not trivial. Nor could it be found in Teradata manual (at least I couldn’t). The information was retrieved from “http://developer.teradata.com/applications/articles/large-objects-part-1-loading”.
In this website, I found another solution that uses the BTEQWIN client. Let us first create a table:
create table financial.blobje(
id varchar (80),
binary_lob binary large object
)
unique primary index (id);
Then store several pictures in a directory (say in D:\tmaanen\Desktop\). Say we store “thumpsup.gif” and “thumbsdown.gif” in tht directory.
Also create an import.ctl file that contains:
D:\tmaanen\Desktop\thumbsup.gif,xxxx D:\tmaanen\Desktop\thumbsdown.gif,yyyy
In BTEQWIN, we create a loop that reads the lines in this import.ctl file. This can be achieved by:
.import vartext ‘,’ LOBCOLS=1 file=’D:\tmaanen\Desktop\import.ctl’;
followed by:
.repeat * using (lobFile blob as deferred,id varchar(40)) insert into financial.blobje (id, binary_lob) values (:id, :lobFile);
It is not really easy, but it can be admittingly done.
Uoef:

I installed teradata recently. The installation was really straight forward. And as a bonus, you get tons of client tools. It is possible to use several clients to start your sql.
A popular tool is BTEQWIN. However, if you google on BTEQWIN, you get lots of questions that ask: how do I connect to the Teradata server. The answer is quite easy (but not very well documented).
If you use BTEQWIN on your server, you issue the command: .logon =tpid=/dbc. The tpid can be found in the hosts file in C:\Windows\System32\drivers\etc. Most likely you will something like: 127.0.0.1 localhost localtdcop1. Leave out the “cop1″ and you have your tpid. In this case: .logon localtd/dbc. You get prompted for the password and you go.
If you use BTEQWIN on a client, the situation is slightly different: use .logon =server=/dbc. I just used the ip-address. This looks like: .logon 192.168.1.72/dbc. I got prompted for the password and I started.
Strangely enough, I found it difficult to find a clue how to use the .logon command. Hum – now we know.
Yesterday, I started installing Oracle. I decided to download the newest version: version 11g R. I used windows xp as a platform with DHCP switched off. It really looked simple: the screens look simple and the questions asked were straight forward. I let it go for about 2 hours and after this time I returned to the machine and I looked if everything was successfull. It certainly was: an encooraging “successfull” looked at me.
I checked the installation by a command “tnsping”. This command was issued at the same machine where Oracle was installed.
I then issued the command “sqlplus uid/psw” that also gave access to the Oracle database.
Finally, I installed TOAD on the same machine as Oracle installed and also TOAD gave access to the database.
Much to my surprise, I could not get access from another machine as the machine on which I had installed Oracle. I got an error saying “TNS-12541: TNS: geen listener”. I really did not understand as to why this happened. If you google this error, you get ideas on network problems, but this was not the case. I checked every possible typo but this did not happen either.
The clue as to why the error happened came as I compared the outcomes form the tnsping command. From the server, it looked like:
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl)))
OK (20 msec)
look at the host! It reads localhost where I expected to see the serveraddress.
The outcome from the remote computer was:
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = 192.168.1.9)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = ORCL)))
TNS-12541: TNS: geen listener
the host is 192.168.1.9 which in itsself is ok.
Apparently the listener on the server listens to the localhost whereas it had to listen to the server address 192.168.1.9.
The solution was twofold.
First I changed the hosts file on the server in \windows\system32\drivers\etc. I commented out the localhost and I introduced the serveraddress.
Secondly, I changed the tnsnames.ora, listener.ora that can be found in \network\Administrator on the server. Every “localhost” reference was replaced by the serveraddress 192.168.1.9.
This seemed to help: the listener started to listen to the network address and it was possible to access the database from outside.
Oops – took a few hours but Oracle was up and running.
Yesterday evening, I received a copy of Expressor. This is a free ETL tool which made a very good impression to me.
I downloaded the copy from http://www.expressor-software.com . The download and subsequent installation is very simple. The file to be downloaded is about a 100MB and the installation is straightforward: some legal screens and a few hits on the ” next button” .
Once the tool was installed, I started my watch to see how much time was needed to create a dataflow. I wanted to extract the values from a table and write these to a file. It took me 18 minutes to create this flow and have it run.
The setup of the tool is straight forward. The first things to do is to set up the data connections (to Oracle, ODBC, SQL Server etc).
Once this is done, one could create the input- and output logic. Once could decribe what data are to be read and how the data are written to a target database. Finally one could describe the transformation in data betwee in- and output. Everything was positioned at logical / easily understandable locations within the programme.
I really liked this programme. It came across as an ETL programme without the issues that make difficult in an ETL environment. Like user access: difficult to understand, more difficult to implement and impossible to debug. Within expressor no user access module (thank God).
I was asked a few days to write a SQL that would retrieve the one but latest row. Take as an example a few rows below: we have a several functions with their min salary. The question is: which function earns the one but highest minumum salary. In the rows below, we have minimum salary 20000, 15000, 3000 and 8200. The outcome would be the row where the minimum salary 15000 is found as we should disregard the highest number (here 20000).
| JOB_ID | JOB_TITLE | MIN_SALARY | MAX_SALARY |
|---|---|---|---|
| AD_PRES | President |
20000
|
40000
|
| AD_VP | Administration Vice President |
15000
|
30000
|
| AD_ASST | Administration Assistant |
3000
|
6000
|
| FI_MGR | Finance Manager |
8200
|
16000
|
To this problem, we have two solutions:
One solution is:
SELECT rownum, min_salary from (select min_salary from jobs order by min_salary desc)A ;
Here, we order the table first, then return the rows with the rownumber. This could then be filtered on rownumber=2.
The second solution is:
SELECT min_salary,
RANK() OVER (ORDER BY min_salary desc) "rank"
FROM jobs;
This makes use of the so-called analytical functions. It leads to the same result: after a filter on RANK()=2, we have the desired row.