Fractions in postgresql

Sql and postgresql in particular has no native support for fractions. Sometimes you need to implement rational numbers but can’t accept rounding. This is a requirement very common in the financial sector.

So what solutions are there for postgresql in particular. One good solution is the pgmp postgresql extensions. It has a mpq datatype that does what you want. Unfortunately my project required that it could be deployed on amazon RDS, which doesn’t include pgmp as a supported extension.

Luckily rolling out your own Fraction library that behaves like a native type is not only possible but not too difficult thanks to the excellent extendability of postgresql.

The full code can be found here.

Most of it is very straightforward.

Probably the most interesting part is bringing fractions to the same nominator for addition/multiplication. For this I’ve implement eucledian’s greatest common divider algorithm to implement least common demoninator to implement simplify which brings a fractions to it’s simplest form.

/*greatest common divider eucledian algorithm*/
CREATE OR REPLACE FUNCTION GCDiv(a INTEGER, b INTEGER) 
RETURNS INTEGER AS 
$$ 
DECLARE 
    c INTEGER; 
BEGIN 
    IF (a IS NULL OR b IS NULL OR (a = 0 AND b = 0)) THEN 
        RETURN 0;
    END IF;
    RAISE NOTICE 'a: %', a;
    RAISE NOTICE 'b: %', b;
    IF (a = 0 OR b = 0) THEN 
        RETURN ABS(a) + ABS(b); 
    END IF; 
    IF (ABS(a) < ABS(b)) THEN c = ABS(a); a = ABS(b); b = c; ELSE a = ABS(a); b = ABS(b); END IF; c = a % b; WHILE c > 0 LOOP 
        a = b; 
        b = c; 
        c = a % b; 
    END LOOP; 
    RETURN b; 
END; 
$$ LANGUAGE 'plpgsql';

/*Lowest common denominator*/ 
CREATE OR REPLACE FUNCTION LCDenom(a INTEGER, b INTEGER) 
RETURNS INTEGER AS 
$$ 
BEGIN 
    RETURN a / GCDiv(a, b) * b; 
END; 
$$ LANGUAGE 'plpgsql'; 

/*Simplifies fraction to canonical form*/
CREATE OR REPLACE FUNCTION simplify(a Fraction) RETURNS Fraction AS 
$$ 
DECLARE 
    gcd INTEGER; 
BEGIN 
    LOOP 
        gcd = GCDiv(a.n, a.d); 
        EXIT WHEN gcd = 0; 
        EXIT WHEN gcd = 1; 
        a.n = a.n / gcd; 
        a.d = a.d / gcd; 
    END LOOP; 
    IF (a.d < 0) THEN /*always remove the minus FROM the denominator, !!other gt/gte/lt/lte relies on thsi behaviour*/
        a.n = -a.n;
        a.d = -a.d;
    END IF;
    RETURN a;
END; 
$$ LANGUAGE 'plpgsql'; 

Arithmetically it’s sufficiently efficient to not cause any problems I think.

Then you have your basic operations which are relevant to fractions: + – * / power abs. And your comparison, operators =, >, >=, <, <=. Postgresql enforces operator preference on it’s own, so they should probably work as you’d expect.

If you have any performance issues, consider removing the equality operator. It will then recert to default exact equality and no longer detect 1/2 to be equal to 2/4, but your application may not need this if all your fractions stay in canonical form. And since equality is such a common operation within queries, that change alone will greatly increase performance.

Finally it implements the sum, min, max, avg: aggregate functions. Avg may be of some interest as I’ve not found any good examples or docs to implement an aggregate function that keeps count of the elements. After some stupid attempts, I simply ended up writing this:

CREATE OR REPLACE FUNCTION calculate_avg(arr Fraction[]) RETURNS Fraction AS 
$$ 
DECLARE 
    sum Fraction; 
    el Fraction;
BEGIN 
    sum = (0,1); 
    FOREACH el IN ARRAY arr LOOP 
        sum = sum + el; 
    END LOOP; 
    sum.d = sum.d * array_length(arr, 1);
    sum = simplify(sum); 
    RETURN sum; 
END; 
$$ LANGUAGE 'plpgsql'; 

CREATE AGGREGATE avg(Fraction) (
    sfunc = array_append,
    stype = Fraction[],
    initcond = '{}',
    finalfunc = calculate_avg
);

Basically it just builds an array with array_append and then uses that to calculate the avg. It seems so obvious now. But my lack of understanding and the lack of good examples of the create aggregate directive had me guessing for a while.

Process a large javascript array asynchronously but in batches

Imagine I have a large amount of network requests to make or I’m restructuring a part from my database and I want to read several 1000s or even millions of rows, do something with them and then write them back to the database.

So I go over some kind of array with all the work to be done and start asynchronous functions to actually deal with it. This works great for time, but if I pull up the process list 20 mins later, I notice that my app is using all available memory or has crashed already and half the requests have timed out. So what happened ?

Basically it’s much quicker to queue work than to actually handle the work. An obvious solution is to do everything synchronously, only starting on the next element when the previous has finished. However this is also not ideal. My db is probably ok with 50 simultaneous queries, just not a 100.000.

So you can batch the work into batches of 50. This is much better but now the longest query in the end is left alone at the end of every batch and the db is mostly idle.

What we really want is 50 ongoing request at any given time, no more but also no less. The solution is simpler than you think using an approach similar to what we’ve used in our attempt to synchronize a loop with asynchronous functions.

function batch_loop(arr, n, fn, done) {
  var remaining = n;
  var i = 0;
  function loop() {
    remaining--;
    if (i < arr.length) {    
      fn(arr[i++], (e) => {
        remaining++;
        setImmediate(() => loop());
      });
    } else {
      if (remaining == 0) done();
    }
  }
  if (!done) {
    return new Promise(function(resolve) {
      done = resolve;
      for (var j = 0; j < n; j++) loop();
    })
  } else {
     for (var j = 0; j < n; j++) loop();
  }
}

As you can see very similar, we only start n loop functions at the same time and every time one finishes we start another one. Meanwhile we also keep track of how many ‘slots’ we have still remaining and if that’s all of them, then all tasks have finished and we’re done.

You can use it like this, setTimeout just representing a random asynchronous task:

var arr = [1,2,3,4,5,6,7,8,9,10]
batch_loop(arr, 3, (el, done) => {
    setTimeout(() => {
        console.log(el);
        done();
    }, 1000)
}, () => {
    //continue
});

Or

var arr = [1,2,3,4,5,6,7,8,9,10]
batch_loop(arr, 3, (el, done) => {
    setTimeout(() => {
        console.log(el);
        done();
    }, 1000)
}).then(() => {
    //continue
});

Synchronize loops with asynchronous functions in javascript

Javascript uses a lot of asynchronous functions, which is great. But sometimes you just want to wait for an asynchronous function to finish before you can move on.

So we resort to promises:

new Promise((resolve) => {
    some_async_function(data, (err, done) => {
        resolve();
    });
}).then(() => {
    //continue
})

Or when chaining them:

new Promise((resolve) => {
    some_async_function(data, (err, done) => {
        resolve();
    });
}).then(() => {return new Promise((resolve) => {
    some_other_async_function(data, (err, done) => {
        resolve();
    });
})}).then(() => { 
    //continue 
}) 

A scary number of brackets, but it works well enough.

When we want to do some operations in parallel, but wait for the results, we can gather promises.

var promises = []
for (let el of arr) {
    promises.push(new Promise((resolve) => {
        some_async_function(el, (err, done) => {
            resolve();
        });
    }))
}
Promise.all(promises).then(() => {
    //continue
})

But what if we have a loop, with an asynchronouse function, but we want to wait for the last element to be finished before starting the next. neither of the previous solutions help us here. After thinking about it for a little bit you probably end up with some kind of a recursive solution.

function loop(i) {
    if (i < arr.length) {
        some_async_function(el, (err, done) => {
             //do some work
             loop(i+1)
        }
    } else {
        //continue
    }
}
loop(0);

The problem with this code is because loop, keeps calling loop, the callstack grows and grows. Although it’s prime candidate for tail recursion optimization, at the time of this writing node and most browsers do not implement this and will crash if arr exceeds a 1000 elements or so.

The solution is to put the recursive call of loop onto the event queue:

function loop(i) {
    if (i < arr.length) {
        some_async_function(el, (err, done) => {
             //do some work
             setImmediate(() => { loop(i+1) });
        }
    } else {
        //continue
    }
}
loop(0);

If setImmediate is not supported you can also use setTimeout with a timeout of 0;

This works but is far from pretty, so for my code I’ve added a library function that does this automatically:

function sync_loop(arr, fn, done) {
    function loop(i) {
        if (i < arr.length) {
            fn(arr[i], () => {
                setImmediate(() => loop(i+1));
            });
        } else {
            done();
        }
    }
    if (!done) {
        return new Promise(function(resolve) {
            done = resolve;
            loop(0);
        })
    } else {
        loop(0);
    }
}

Note that if no done function is supplied, I return a Promise instead.

You would use it like this:

var arr = [1,2,3,4,5,6,7,8,9,10]
sync_loop(arr, (el, done) => {
    setTimeout(() => {
        console.log(el);
        done();
    }, 1000)
}, () => {
    //continue
})

Or if you don’t supply a done function, it will return a promise, which you can use like this:

var arr = [1,2,3,4,5,6,7,8,9,10]
sync_loop(arr, (el, done) => {
    setTimeout(() => {
        console.log(el);
        done();
    }, 1000)
}).then(() => {
    //continue
})

As a final note, I’ve not actually added any error handling, I didn’t want to make too much assumptions on the passed function. However since it’s basically a synchronous function, it is safe to throw from inside the asynchronous without calling done to interrupt the loop.

I considered something like this:

function sync_loop(arr, fn, done) {
    var reject;
    function loop(i) {
        if (i < arr.length) {
            fn(arr[i], (e) => {
                if (e) reject(e);
                else setImmediate(() => loop(i+1));
            });
        } else {
            done();
        }
    }
    if (!done) {
        return new Promise(function(resolve, _reject) {
            reject = _reject;
            done = resolve;
            loop(0);
        })
    } else {
        reject = done;
        loop(0);
    }
}

This is more inline with how you handle errors in the asynchronous world.

var arr = [1,2,3,4,5,6,7,8,9,10]
sync_loop(arr, (el, done) => {
    setTimeout(() => {
        console.log(el);
        if (el > 5) done("error")
        else done();
    }, 1000)
}, (e) => {
    if (e) {
        //deal with error
        return;
    }
    //continue
})

Or

var arr = [1,2,3,4,5,6,7,8,9,10]
sync_loop(arr, (el, done) => {
    setTimeout(() => {
        console.log(el);
        if (el > 5) done("error")
        else done();
    }, 1000)
}).then(() => {
    //continue
}).catch((e) => {
    //deal with error
})

The choice is up to you really.

Postgresql and javascript dates

Dates in javascript are internally stored as a number of milliseconds since 00:00 UTC, January 1st 1970 or posix time / 1000, you usually access this through a DATE object. In postgresql the data is technically stored similarly but is accessible as either DATE (WITHOUT TIME ZONE) or TIMESTAMP (WITHOUT TIME ZONE). You can ofcourse store dates differently but storing dates and times in their native format has the advantage that you can use all the native functions on both platforms.

Switching between the two though poorly documented.

First of all timezones are something you deal with at the front-end, it has no place in a database back-end, so I recommend using the WITHOUT TIME ZONE variants.

Now to put a date from javascript into postgresql is easy enough the TO_TIMESTAMP function puts a posix time (00:00 UTC, January 1st 1970), so it’s a matter of taking a javascript timestamp / 1000.

The other way around you can use “EXTRACT(EPOCH FROM $1)” to output a date as posix time.

To make things more explicit, I just add these two functions to my db to do the conversion.

CREATE OR REPLACE FUNCTION from_posix_time(double precision) RETURNS timestamp without time zone AS 'SELECT to_timestamp($1) AT TIME ZONE ''UTC'';' LANGUAGE SQL;
CREATE OR REPLACE FUNCTION to_posix_time(timestamp without time zone) RETURNS double precision AS 'SELECT EXTRACT(EPOCH FROM $1);' LANGUAGE SQL;
CREATE OR REPLACE FUNCTION from_js_time(double precision) RETURNS timestamp without time zone AS 'SELECT to_timestamp($1/1000) AT TIME ZONE ''UTC'';' LANGUAGE SQL;
CREATE OR REPLACE FUNCTION to_js_time(timestamp without time zone) RETURNS double precision AS 'SELECT EXTRACT(EPOCH FROM $1)*1000;' LANGUAGE SQL;

Now I can do the conversion right there in the query, which is after all the glue that puts the database and the back-end together.

SELECT to_js_time(some_time) FROM some_table;
UPDATE some_table SET some_time = from_js_time(js_time);

 

 

Social login with Node.js

You may have noticed this website supports a couple of social login options: Google, Facebook, Twitter, Steam, VK, Battletnet and WG. I think it’s an excellent way to uniquely identify users without all the hassle of dealing with registration and the potential security risks of dealing with passwords.

Today I’ll talk you through how your Node/Express website can support these login options. Accompanying this blog post is this github project. It’s a small demo project that implements a simple Node/express website with session support and the logins above.

 

First a little background info

Logging in with these services is only possible because they implement some kind of API to verify a user’s identity. To be exact Google, Facebook, VK, Bnet implement OAauth2. Twitter implements OAauth1. And Wargaming and Steam implement OpenID.

The implementations are all slightly different but broadly it comes down to this:

  1. My website wants information about you.
  2. You are redirected by my website to the authorization server (e.g.: Google). Part of the request is a token to proof that my website sent you and what info I want access to.
  3. If you authorize that access, the authorization server (e.g.: Google) redirects you back to my website. In case of OpenID it includes proof of your identity. In case of OAauth I get a code I can exchange for a token that gives me a certain amount of access to your account. In case of my website I ask for read access to your username and display name if it is different from your username.

So from the position of my website I need to do 3 things for this to work.

  1. I need to register my website at the authorization server (i.e.: google) as a client. It’s a manual process mostly. These services have all some kind of developer console where I can register my website.
  2. I need a page that will redirect you to the authorization server and where I can start the authentication. For my website it’s ‘/auth/google’, ‘/auth/facebook’, … .
  3. I need a landing page where the authorization server can redirect you to after you’ve authorized access and where I can finish authentication. For my website that’s ‘/auth/google/callback’, ‘/auth/facebook/callback’, … .

 

Step 1

For the first step you will need to register your website at the different authorization servers. They are all a little different, but in general, look for some kind of developer page, create a new app.

Within the settings for that app you’ll have to register the callbacks url(s). (e.g.: For google you’ll need register http://hostname/auth/google/callback). It’s always a good idea to register localhost as well so you can do some local testing.

After registering your app you’ll find an app id and a secret, you’ll need those for step 2.

 

Step 2

I’ve implemented a small demo project here. It implements a basic Node/Express website with session support and the different login options. Just fill in the app ids/secrets of the login options you want to support and give it a try.

 

Dealing with different domains

Now that all seems pretty straightforward, but it becomes a little more complicated if you have different domains and or subdomains. en.gametactic.org is unfortunately not the same as fi.gametactic.org and definitely not the same as wottactic.com. And registering the callback urls for all of the is at best a hassle, at worst impossible (Facebook/Twitter)

When dealing with subdomains, you can just put the callback on the main domain. Then make sure your session cookie is a domain cookie. I’ve already implemented this in the example.

When dealing with different domains, there is also the option of creating a redirection page. For example you can change the callback url to “http://some_fixed_domain/callback_redirect.html?redirect=real_callback_domain”. Then make callback_redirect.html something in the lines of this. It just takes the url from the redirect url parameter and redirects there, while copying all the other url parameters, so login succeeds.

The battlenet login is even more picky than that. First of all it only allows 1 redirect url, that redirect url has to be on a ssl/https server. And lastly it doesn’t allow extra url parameters in the callback url, so the fix above doesn’t work. If however the initial request has a state parameter, it will copy that state parameter to the callback url. So I can pass the real callback url in the state parameter and do the redirect trick from above on a page on an https server.

Anyway, that should get you started if you want to add some new login options to your brand new website/app.

 

 

 

 

Between the bits

It was about time wottactic/gametactic had a bit of a comment/forum platform. And I thought it might be interesting to start a bit of a programming/technology blog.

So I introduce “between the bits”, has a nice ring to it, don’t you think.

So what better way to start than how I set things up.

First of all the goal was to integrate the blog and forum with the existing http://gametactic.org. That website however is build using Node.js (server side javascript)/Express(web server library)/Mongodb(OO database). These are great tools to create an interactive web application, but the libraries to easily create a blog/forum either don’t exist or are not very mature.

So instead I went out there, got some nice cheap shared LAMP (Linux/Apacche/Mysql/PHP) hosting and setup a wordpress.org/bbpress blog/forum combo instead.

So now I got two different websites on two different servers on different continents that need to work together. So now I have two problems to fix. They need to look the part and they need to share a single login.

Now I’m going to talk about integrating the login today. It’s surprisingly simple. There’s 2 parts.

 

Acquiring your login information

This step will very much depend on the setup of your original website.

For me it works like this. My original website keeps track of users, using a cookie with the sessions ID. Now since it’s a domain cookie and the forum is on a subdomain, I can simply read the session id from the cookie, connect to the db server that stores the session info remotely.

In my setup sessions are stored in a redis database and the code to access it looks something like this:

$session_cookie = $_COOKIE[connect_sid];
if (is_null($session_cookie)) return; //no session, nothing to do

$key = ‘sess:’.substr($session_cookie, 2, strpos($session_cookie, ‘.’)-2);

global $redis_client;
if (!$redis_client || !$redis_client.isConnected()) {
    $redis_client = new Predis\Client([
        ‘scheme’ => ‘tcp’,
        ‘host’   => REDIS_HOST,
        ‘port’   => REDIS_PORT,
        ‘password’ => REDIS_PASSWORD,
        ‘persistent’ => true
    ]);
}

$session_data = json_decode($redis_client->get($key));
$user_info = $session_data->passport->user;

Pretty simple, using predis a redis library for php.

Now when you come from gametactic.org and you go to forum.gametactic.org, I know who you are.

 

Logging you in

Now that I know who you are I can check if you are logged in. And if you are I can log you in on the forum as well. Or if you’ve never been on the forum before, I’ll create a new account.

Using wordpress, it looks something like this:

if ($user_info->identity) {
    //If you made it to this point, you are are logged in on the other website but not here
    $user_login = $user_info->identity;
    $user_id = username_exists($user_login);
    if (!$user_id) {
        $random_password = wp_generate_password($length=12);
        $userdata = array(
            ‘user_login’  =>  $user_login,
            ‘user_pass’   =>  $random_password,
            ‘display_name’ =>  $user_info->name
        );
        $user_id = wp_insert_user($userdata);
    }
    wp_set_current_user($user_id, $user_login);
    wp_set_auth_cookie($user_id);
    do_action(‘wp_login’, $user_login);
}

Notice I actually generate a 12 character random password. But nobody will ever see it really, I just log you in programmatically.

Now, is it secure ?

Well basically it accepts the session key as your authentication. But that’s true for almost login systems after of the point of login, think of it more as sharing a session. Provided you use ssl on both hosts, I would be pretty confident.

You do of course need to make sure when you expose a db server to the internet, that you password protect it and it’s probably wise to only allow connections from known ip’s.