syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
How do I copy truefx data into Postgres?

I often encounter machine learning use-cases.

Machine learning algorithms are hungry for data.

While searching google one day, I found a site named truefx.com which offers free Forex data in CSV format.

Once I understood what the site was offering me, a question which came to my mind:

Is it possible to look at past Forex data and predict future Forex price directions?

After I found the site I spent a few hours building predictive models.

Early in that process I wrote some scripts to copy the truefx data into Postgres.

Those scripts are the topic of this post.

Usually when I write these types of scripts I work from the outside in.

Almost always the outside script is a shell script.

The shell script I wrote to handle this task is displayed below:
#!/bin/bash

# ~/cjb4/fx/truefx_load.bash

cd ~/cjb4/fx/

for file in ~/truefx/*USD*zip
do
  unzip -p $file > /tmp/truefx.csv
  /bin/ls -l $file /tmp/truefx.csv
  ~/cjb4/fx/psqlmad.bash -f truefx.sql
done
The above shell script depended on another shell script named psqlmad.bash so I wrote that next:
#! /bin/bash

# psqlmad.bash

# This is a simple wrapper script.
# It connects me to both the madlib schema and database.

PGPASSWORD=madlib psql -ah 127.0.0.1 -U madlib -d madlib -p 5432 -P pager=no $@

exit
Also the script depened on a SQL script named truefx.sql so I wrote that next:
--
-- ~/cjb4/fx/truefx.sql
--

-- I use this script to load zip files in ~/truefx/
-- I use a shell command to unzip each file into
-- /tmp/truefx.csv

-- So, this sql script only knows about /tmp/truefx.csv

CREATE TABLE IF NOT EXISTS truefx_stage1
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,bid   float8
,ask   float8
)
;

TRUNCATE TABLE truefx_stage1;

COPY truefx_stage1
(
pair
,ttime
,bid
,ask
) FROM '/tmp/truefx.csv' WITH csv
;

-- rpt
SELECT COUNT(pair),MIN(ttime),MAX(ttime),MIN(bid),MAX(ask) FROM truefx_stage1;

CREATE TABLE IF NOT EXISTS truefx_stage2
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,cp    FLOAT8
)
;

TRUNCATE TABLE truefx_stage2;

INSERT INTO truefx_stage2(pair,ttime,cp)
SELECT
pair
,date_trunc('hour', ttime) + INTERVAL '5 min' * ROUND(date_part('minute', ttime) / 5.0) AS ttime
,(bid+ask)/2 AS cp
FROM truefx_stage1 
;

CREATE TABLE IF NOT EXISTS truefx
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,cp    FLOAT8
)
;

INSERT INTO truefx(pair,ttime,cp)
SELECT
pair
,ttime
,ROUND(AVG(cp)::NUMERIC,4)::float8 AS cp
FROM truefx_stage2
GROUP BY pair,ttime
;
\q
Here is a screendump of me getting them to work together:

dan@z2.z2 ~/cjb4/fx $ 
dan@z2.z2 ~/cjb4/fx $ 
dan@z2.z2 ~/cjb4/fx $ ./truefx_load.bash
-rw-rw-r--. 1 dan dan  9288113 Dec  7 08:50 /home/dan/truefx/AUDUSD-2014-08.zip
-rw-rw-r--. 1 dan dan 76196524 Dec  8 08:38 /tmp/truefx.csv
--
-- ~/cjb4/fx/truefx.sql
--
-- I use this script to load zip files in ~/truefx/
-- I use a shell command to unzip each file into
-- /tmp/truefx.csv
-- So, this sql script only knows about /tmp/truefx.csv
CREATE TABLE IF NOT EXISTS truefx_stage1
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,bid   float8
,ask   float8
)
;
psql:truefx.sql:18: NOTICE:  relation "truefx_stage1" already exists, skipping
CREATE TABLE
TRUNCATE TABLE truefx_stage1;
TRUNCATE TABLE
COPY truefx_stage1
(
pair
,ttime
,bid
,ask
) FROM '/tmp/truefx.csv' WITH csv
;
COPY 1664537
-- rpt
SELECT COUNT(pair),MIN(ttime),MAX(ttime),MIN(bid),MAX(ask) FROM truefx_stage1;
  count  |           min           |           max           |   min   |   max   
---------+-------------------------+-------------------------+---------+---------
 1664537 | 2014-08-01 00:00:00.035 | 2014-08-29 21:00:00.039 | 0.92353 | 0.93761
(1 row)

CREATE TABLE IF NOT EXISTS truefx_stage2
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,cp    FLOAT8
)
;
psql:truefx.sql:40: NOTICE:  relation "truefx_stage2" already exists, skipping
CREATE TABLE
TRUNCATE TABLE truefx_stage2;
TRUNCATE TABLE
INSERT INTO truefx_stage2(pair,ttime,cp)
SELECT
pair
,date_trunc('hour', ttime) + INTERVAL '5 min' * ROUND(date_part('minute', ttime) / 5.0) AS ttime
,(bid+ask)/2 AS cp
FROM truefx_stage1 
;
INSERT 0 1664537
CREATE TABLE IF NOT EXISTS truefx
(
pair   VARCHAR(7)
,ttime TIMESTAMP
,cp    FLOAT8
)
;
psql:truefx.sql:58: NOTICE:  relation "truefx" already exists, skipping
CREATE TABLE
INSERT INTO truefx(pair,ttime,cp)
SELECT
pair
,ttime
,ROUND(AVG(cp)::NUMERIC,4)::float8 AS cp
FROM truefx_stage2
GROUP BY pair,ttime
;
INSERT 0 5873
\q

snip ...

psql:truefx.sql:58: NOTICE:  relation "truefx" already exists, skipping
CREATE TABLE
INSERT INTO truefx(pair,ttime,cp)
SELECT
pair
,ttime
,ROUND(AVG(cp)::NUMERIC,4)::float8 AS cp
FROM truefx_stage2
GROUP BY pair,ttime
;
INSERT 0 5094
\q
dan@z2.z2 ~/cjb4/fx $ 

Next, I ran a simple query to help me see the truefx data I had just loaded into Postgres:
dan@z2.z2 ~/cjb4/fx $ 
dan@z2.z2 ~/cjb4/fx $ 
dan@z2.z2 ~/cjb4/fx $ ./psqlmad.bash 
psql (9.2.9)
Type help for help.
madlib=# SELECT pair,COUNT(pair),MIN(ttime),MAX(ttime) FROM truefx GROUP BY pair;
  pair   | count |         min         |         max         
---------+-------+---------------------+---------------------
 EUR/USD | 22511 | 2014-08-01 00:00:00 | 2014-11-28 22:00:00
 AUD/USD | 22248 | 2014-08-01 00:00:00 | 2014-11-28 22:00:00
 GBP/USD | 21158 | 2014-08-01 00:00:00 | 2014-11-28 22:00:00
 NZD/USD | 20324 | 2014-08-01 00:00:00 | 2014-11-28 00:00:00
 USD/CAD | 14018 | 2014-08-01 00:00:00 | 2014-11-28 00:00:00
 USD/JPY | 23121 | 2014-08-01 00:00:00 | 2014-11-28 22:00:00
(6 rows)

madlib=# 
madlib=# 

syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me