I recently had to write a script that would import large amounts of data (~50MB) into a PostgreSQL database every night. I thought long and hard about how this could be accomplished best, I read a lot about it on the usenet and so I decided that the best way would be to use PostgreSQL’s COPY function. The data is available in a .CSV format so everything worked very smoothly.
However after the data is imported it also shall be searchable very easily. Therefor I let the import script create indexes on the data. And as I like to organize and normalize data I thought about a way to automatically creat lookup tables. The following picture describes what I did:
This describes the steps necessary to create the lookup table:
- We try to delete an probably already existing lookup table.
DROP TABLE #LOOKUP_TABLE#;
Delete the table #LOOKUP_TABLE# if it exists.
- We create a new lookup table.
CREATE TABLE #LOOKUP_TABLE#
( id SERIAL NOT NULL PRIMARY KEY,
#LOOKUP_COLUMN# #LOOKUP_COLUMN_TYPE#);Create the table #LOOKUP_TABLE# with one column called “id” which is the primary key and a second column called #LOOKUP_COLUMN# which is of the type #LOOKUP_COLUMN_TYPE#.
- We fill the lookup table with the values from the large data table.
INSERT INTO #LOOKUP_TABLE# #LOOKUP_COLUMN#
(SELECT DISTINCT #DATA_TABLE_COLUMN# FROM #DATA_TABLE#);Insert the following data into the table #LOOKUP_TABLE#: every distinct piece of information in the column #DATA_TABLE_COLUMN# from the table #DATA_TABLE#.
- We add a column to the large data table that will hold the reference key to the lookup table.
ALTER TABLE #DATA_TABLE# ADD COLUMN #LOOKUP_COLUMN#_id INTEGER;
Add an INTEGER column to the #DATA_TABLE# and call it #LOOKUP_TABLE#_id.
- We fill that column in the large data table with the appropriate key.
UPDATE ONLY #DATA_TABLE#
SET #LOOKUP_COLUMN#_id = #TARGET_TABLE#.id
FROM #TARGET_TABLE#
WHERE #DATA_TABLE#.#DATA_TABLE_COLUMN# = #LOOKUP_TABLE#.#LOOKUP_COLUMN”;Now comes the cool part: We update the #LOOKUP_COLUMN#_id column in the #DATA_TABLE# table. We write the the id into that colum whereever the #DATA_TABLE_COLUMN# from the #DATA_TABLE# matches the #LOOKUP_TABLE_COLUMN# from the #LOOKUP_TABLE#.
- We create a foreign key on that referencing column.
ADD CONSTRAINT #DATA_TABLE# #LOOKUP_COLUMN#_id_fk FOREIGN KEY (#LOOKUP_COLUMN#_id) REFERENCES #LOOKUP_TABLE#(id) MATCH FULL;
Here we simply add the foreign key #LOOKUP_COLUMN#_id_fk to the #DATA_TABLE# on the #LOOKUP_COLUMN#_id column.
- We delete the old colum that holds the original values.
ALTER TABLE #DATA_TABLE# DROP COLUMN #DATA_TABLE_COLUMN# CASCADE;
Here we drop the #DATA_TABLE_COLUMN# in the #DATA_TABLE#.
I will probably post a PHP function that does all the above on phpclasses.org in a few days.