echelon


SQL Magic – Automatically Creating Lookup Tables

I recently had to write a script that would import large amounts of data (~50MB) into a PostgreSQL database every night. I thought long and hard about how this could be accomplished best, I read a lot about it on the usenet and so I decided that the best way would be to use PostgreSQL’s COPY function. The data is available in a .CSV format so everything worked very smoothly.

However after the data is imported it also shall be searchable very easily. Therefor I let the import script create indexes on the data. And as I like to organize and normalize data I thought about a way to automatically creat lookup tables. The following picture describes what I did:

create_lookup_table

This describes the steps necessary to create the lookup table:

  1. We try to delete an probably already existing lookup table.

    DROP TABLE #LOOKUP_TABLE#;

    Delete the table #LOOKUP_TABLE# if it exists.

  2. We create a new lookup table.

    CREATE TABLE #LOOKUP_TABLE#
    ( id SERIAL NOT NULL PRIMARY KEY,
    #LOOKUP_COLUMN# #LOOKUP_COLUMN_TYPE#);

    Create the table #LOOKUP_TABLE# with one column called “id” which is the primary key and a second column called #LOOKUP_COLUMN# which is of the type #LOOKUP_COLUMN_TYPE#.

  3. We fill the lookup table with the values from the large data table.

    INSERT INTO #LOOKUP_TABLE# #LOOKUP_COLUMN#
    (SELECT DISTINCT #DATA_TABLE_COLUMN# FROM #DATA_TABLE#);

    Insert the following data into the table #LOOKUP_TABLE#: every distinct piece of information in the column #DATA_TABLE_COLUMN# from the table #DATA_TABLE#.

  4. We add a column to the large data table that will hold the reference key to the lookup table.

    ALTER TABLE #DATA_TABLE# ADD COLUMN #LOOKUP_COLUMN#_id INTEGER;

    Add an INTEGER column to the #DATA_TABLE# and call it #LOOKUP_TABLE#_id.

  5. We fill that column in the large data table with the appropriate key.

    UPDATE ONLY #DATA_TABLE#
    SET #LOOKUP_COLUMN#_id = #TARGET_TABLE#.id
    FROM #TARGET_TABLE#
    WHERE #DATA_TABLE#.#DATA_TABLE_COLUMN# = #LOOKUP_TABLE#.#LOOKUP_COLUMN”;

    Now comes the cool part: We update the #LOOKUP_COLUMN#_id column in the #DATA_TABLE# table. We write the the id into that colum whereever the #DATA_TABLE_COLUMN# from the #DATA_TABLE# matches the #LOOKUP_TABLE_COLUMN# from the #LOOKUP_TABLE#.

  6. We create a foreign key on that referencing column.

    ADD CONSTRAINT #DATA_TABLE# #LOOKUP_COLUMN#_id_fk FOREIGN KEY (#LOOKUP_COLUMN#_id) REFERENCES #LOOKUP_TABLE#(id) MATCH FULL;

    Here we simply add the foreign key #LOOKUP_COLUMN#_id_fk to the #DATA_TABLE# on the #LOOKUP_COLUMN#_id column.

  7. We delete the old colum that holds the original values.

    ALTER TABLE #DATA_TABLE# DROP COLUMN #DATA_TABLE_COLUMN# CASCADE;

    Here we drop the #DATA_TABLE_COLUMN# in the #DATA_TABLE#.

I will probably post a PHP function that does all the above on phpclasses.org in a few days.



No Comments

Sorry, the comment form is closed at this time.