Postgres and Transparent Data Encryption (TDE)

Security has always been a great concern of Enterprises. Especially, if you have crucial information stored in the database, you would always prefer to have high security around it. Over the years, technologies have evolved and provided better solutions around it.

If you have very sensitive information, people try to keep this information encrypted so, that in case, somebody gets access of the system, then they cannot view this information, if they are not authorized.

For managing sensitive information, Enterprises use multiple methods:

  1. Encrypting specific information.

If you are PPAS users, you would like to use DBMS_CRYPTO package which provides a way of encrypting sensitive information in databases.

For more information, please refer following link:

For PostgreSQL, users can use pgcrypto module.

  1. Transparent Data Encryption (TDE) is another method employed by both Microsoft and Oracle to encrypt database files. TDE offers encryption at file level. This method solves the problem of protecting data at rest i.e. encrypting databases both on the hard drive and consequently on backup media. Enterprises typically employ TDE to solve compliance issues such as PCI DSS.

Postgres Plus, currently doesn’t have inbuilt TDE, however, if Enterprises looking for encryption at the database file level, they can use one of the following methods for protecting data at rest:

  1. Full Disk Encryption:

Full disk or partition encryption is one of the best ways of protecting your data. This method not only protects each file, however, also protects the temporary storage that may contain parts of these files.  Full disk encryption protects all of your files and then you do not have to worry about selecting what you want to protect and possibly missing a file.

RHEL (Red Hat) supports Linux Unified Key Setup-on-disk-format (or LUKS). LUKS bulk encrypts Hard Drive partition.

For more information on LUKS, please refer following link:

  1. File system-level encryption:

File system-level encryption often called file/directory encryption. In this method individual files or directories are encrypted by the file system itself.

There is stackable cryptographic file system encryption available which user can utilize in their environment.

File system level Encryption gives following advantages:

  1. Flexible file-based key management, so that each file can be and usually is encrypted with a separate encryption key.
  1. Individual management of encrypted files e.g. Incremental backups of the individual changed files even in encrypted form, rather than backup of the entire encrypted volume.
  1. Access control can be enforced through the use of public-key cryptography, and the fact that cryptographic keys are only held in memory while the file that is decrypted by them is held open.

Stackable cryptographic file system encryption can be use for Postgres for Transparent Data Encryption.

In this blog, I will discuss using mount ecrpytfs as it requires less overhead in setup (LUKS requires a new disk to be configured and formatted before storing data on it. “mount ecrpytfs” works with existing directories and data).

If Enterprises want to give the control to DBAs for TDE, they can use/define few sudo rules for DBAs to execute commands for encryption.

Following is a method, which they can use:

  • Ask system admin to create sudo rules to allow DBA to execute encryption for data directory for Postgres Plus. One common way to do this is using the “mount ecryptfs” command in Linux operating systems.
  • If user needs to encrypt the /ppas94/data directory, they can use following command:
sudo mount -t ecryptfs /ppas94/data /ppas94/data

More information can be found in the documentation from RHEL:

User can also specify encryption key type (passphrase, openssl), cipher (aes, des3_ede…) key byte size, and other options with above commands.

Example is given below:

# mount -t ecryptfs /home /home -o ecryptfs_unlink_sigs \

 ecryptfs_key_bytes=16 ecryptfs_cipher=aes ecryptfs_sig=c7fed37c0a341e19

Centos 7 and RHEL 7, by default doesn’t come with ecrpytfs therefore, users can also use encfs command.

For more information on encfs, please refer following link:

Following are the steps to use encfs to encrypt the data directory.

  1. Create a data directory using the following command, as enterprisedb user.
   mkdir /var/lib/ppas/9.4/encrypted_data
   chmod 700 /var/lib/ppas/9.4/encrypted_data
  1. Use following encfs command to encrypt the data directory.
         encfs /var/lib/ppas-9.4/encrypted_data/ /var/lib/ppas-9.4/data

Snapshot of above command is given below:

encfs /var/lib/ppas/9.4/encrypted_data /var/lib/ppas/9.4/data

The directory "/var/lib/ppas/9.4/data" does not exist. Should it be created? (y,n) y</pre>
Creating new encrypted volume.

Please choose from one of the following options:

enter "x" for expert configuration mode,

enter "p" for pre-configured paranoia mode,

anything else, or an empty line will select standard mode.


Standard configuration selected.

Configuration finished.  The filesystem to be created has

the following properties:

Filesystem cipher: "ssl/aes", version 3:0:2

Filename encoding: "nameio/block", version 4:0:2

Key Size: 192 bits

Block Size: 1024 bytes

Each file contains 8 byte header with unique IV data.

Filenames encoded using IV chaining mode.

File holes passed through to ciphertext.

Now you will need to enter a password for your filesystem.

You will need to remember this password, as there is absolutely

no recovery mechanism.  However, the password can be changed

later using encfsctl.

New Encfs Password: 

Verify Encfs Password: 

  1. After encrypting, data directory, users also need to modify the postgresql- service script to include proper command in it for password. For that either, they can use sshpass or they can write their own program which can pass the password for mounting directory.

As you can see, achieving Transparent Data Encryption Postgres is very easy.

Meet BART – A New Tool for Backup And Recovery Management

EnterpriseDB recently launched a new tool for backup and recovery – named simply EDB Backup and Recovery Tool, or BART. This tool makes the DBA’s life easier by simplifying the tasks for managing their Postgres physical backup and recovery tasks, whether they are PostgreSQL or Postgres Plus Advanced Server deployments.

BART has the following advantages over custom scripts for managing backups:

1. It’s stable and it uses the tool pg_basebackup to take a physical backup. This tool has been well defined and is well-supported by the PostgreSQL community.

2. It catalogs all of the backups users are taking, which is important in terms of:
    i. Listing the type of backups used
   ii. Listing the status of those backups with server information.

3. BART also provides functionality to restore backups, with all required archived WAL files. So automation around this tool will make DBAs’ lives easier for restore and recovery.

4. BART provides an option to validate your backup by using checksum. This is useful for confirming you took a valid backup and it is not corrupted at disk level.

5. BART provides an option to define your retention policy around the backups you are keeping.

Given all of the above advantages, I decided to give this new tool a try and share some tips. To get started, you need the following prerequisites:

1. BART currently requires a Linux 64 bit platform, CentOS 6.x or RHEL 6.x
2. Need to have password-less, direct SSH access to the target machine where you want to restore backups as well as the database servers you want backed up
3. Install the Postgres Plus Advanced Server or PostgreSQL binaries for pg_basebackup

Yum or rpm

To install this tool, you have two options that I will explore below:

1. Yum command
2. Rpm command.

Using the yum command:

To perform a yum command installation, BART users can ask EDB for credentials to the EnterpriseDB yum repository and configure the their local yum repository as follows:

echo &quot;[tools]
name=EnterpriseDB Tools
gpgcheck=0&quot; &gt; /etc/yum.repos.d/edbtools.repo

After creating the yum repo, the user can execute the following command to install BART:

 yum install edb-bart

If the user doesn’t want to install the EDB Backup and Recovery Tool using the yum command, then the user can download a free standing rpm using the link below from EDB’s website:

and then enter the rpm install command as follows:

rpm -ivh edb-bart-1.0.1-1.rhel6.x86_64.rpm

After installing BART using the above commands, the user can see the binaries in the directory:/usr/edb-bart-1.0/bin and a sample BART configuration file in /usr/edb-bart-1.0/etc

That’s a very easy installation.

For more information on configuring BART Host and Database Host, the following are some documents that will help:
1. pg_basebackup configuration for PostgreSQL:

2. For direct password less ssh configuration user can refer following link

After the installation of the BART binaries, the user also has to create a BART configuration file.

The following is a sample configuration file for BART:

bart-host= enterprisedb@
backup_path = /opt/backup
pg_basebackup_path = /usr/ppas-9.4/bin/pg_basebackup
logfile = /tmp/bart.log

host =
port = 5432
user = postgres
description = &quot;Postgres server&quot;

host =
port = 5444
user = enterprisedb
description = &quot;PPAS 94 server&quot;

Global Configuration Settings

Content under the [BART] tag are called global configuration settings. Under this tag are the following:

1. bart-host: the IP address of the host on which BART is installed. The value for this parameter must be specified in the form: bart_user@bart_host_address, where bart_user is the operating system user account on the BART host that is used to run BART and owns the BART backup catalog directory. bart_host_address is the IP address of the BART host.

2. backup_path: specifies the file system parent directory where all BART database server base backups and archived WAL files are stored. This parameter is required.

3. pg_basebackup_path: specifies the path to the pg_basebackup program of the Postgres database server installed on the BART host.

4. log file: specifies the path to the BART log file. This parameter is optional. If no path to a log file is specified after logfile =, or if the parameter is commented out, BART does not create a log file.

The remaining part of configuration file is self-explanatory. The TAG: [PG]/[PPAS94] part is content for servers which the user wants to back up.

Pg_basebackup Settings

After performing the above configuration on the Backup Server, the user has to do set following settings on the servers that they want to back up. Below are the settings for enabling backup using pg_basebackup.

The user has to set a few parameters in PostgreSQL postgresql.conf file, which he wants to backup:

1. wal_level parameter to archive or hot_standby.
2. archive_mode=on
3. archive_command setting.
4. max_wal_senders to 1 or more than one, since pg_basebackup uses the replication protocol to copy data directory.

For more information on each setting please refer to the following:
1. wal_level:

2. archive_mode and archive_command:

3. max_wal_senders:

With the above settings, the user then needs to update the pg_hba.conf file for the replication connection.

Note: The above settings are for pg_basebackup to take backups using replication protocols. In case users need more information about pg_basebackup and settings, please use the above mentioned link

How BART Works

Now, since we have configured both servers, let’s have a look how BART works.

The following command executes a backup:

 bart -c bart.cfg BACKUP -s ppas94

And below is the output:

[bart@localhost ~]$ bart -c bart.cfg BACKUP -s ppas94

INFO:  creating backup for server 'ppas94'
INFO:  backup identifier: '1413852137762'
6394456/6394456 kB (100%), 1/1 tablespace

INFO:  backup checksum: 7f49ea9653511308710c174f22ec765d
INFO:  backup completed successfully
[bart@localhost ~]$ 

That was an easy way to take a backup. The DBA can also create a job to execute the above command to take backups.

If the user wants to list the backup using BART, the user can use the option SHOW-BACKUPS:

[bart@localhost ~]$ bart -c bart.cfg SHOW-BACKUPS -s ppas94
 Server Name   Backup ID       Backup Time           Backup Size  
 ppas94        1413852137762   2014-10-20 17:43:41   6244.59 MB   

This is useful for knowing what backups a user has available for recovery. The above command gives important information:

1.	Backup ID: It’s a unique ID for the physical backup
2.	Backup Time: Time when backup was taken
3.	Backup Size: Size of backup

This information is useful when a user wants to plan for recovery using backup. This way, the user can also plan for disk size.

Sometimes a user wants to verify their backup state. VERIFY-CHKSUM option is useful in this case:

[bart@localhost ~]$ bart -c bart.cfg VERIFY-CHKSUM -s ppas94 -i 1413852137762
 Server Name   Backup ID       Verify  
 ppas94        1413852137762   OK      

I have to say, after putting EDB BART through its paces, I think DBAs will enjoy having such a great tool for making Backup Management easy.

In my next post, I will blog about the Recovery process.

Monitoring approach for Streaming Replication with Hot Standby in PostgreSQL 9.3.

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

   pid:              Process id of walsender process
   usesysid:         OID of user which is used for Streaming replication.
   usename:          Name of user which is used for Streaming replication
   application_name: Application name connected to master
   client_addr:      Address of standby/streaming replication
   client_hostname:  Hostname of standby.
   client_port:      TCP port number on which standby communicating with WAL sender
   backend_start:    Start time when SR connected to Master.
   state:            Current WAL sender state i.e streaming
   sent_location:    Last transaction location sent to standby.
   write_location:   Last transaction written on disk at standby
   flush_location:   Last transaction flush on disk at standby.
   replay_location:  Last transaction flush on disk at standby.
   sync_priority:    Priority of standby server being chosen as synchronous standby
   sync_state:       Sync State of standby (is it async or synchronous).


postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      |
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.


postgres=# select pg_is_in_recovery();
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.


postgres=# select pg_last_xlog_receive_location();
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

postgres=# select pg_last_xlog_replay_location();
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

postgres=# select pg_last_xact_replay_timestamp();
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:


4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.

   i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:

pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

  ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

   SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
                 THEN 0
               ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
          END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

New in PostgreSQL 9.3: New in Functions

In the series of blogging about new features in 9.3, today, I thought about blogging new functions and improvements coming in PostgreSQL.

Lets look whats new in 9.3, in terms of in build functions:

1. New in one array functions for one dimensional array

PostgreSQL 9.3, is coming with new functions which can help users to manipulate one dimensional arrays by calling simple functions at the place of crafting their own functions and following some methods to do the modification in it.

i. array_remove function

This is a new function added in 9.3, which provides ability for removing the elements from array. Function takes two arguments:
a. One dimensional array from which user wants to remove elements
b. element value which user wants to remove

Syntax of this function is given below:

ARRAY_REMOVE(<one dimensional array>, element)

Example of array_remove is given below:

postgres=# select array_remove(ARRAY['First','Second','Delete','Four'],'Delete');
(1 row)

ii. array_replace function

This is a new in 9.3, this helps user to replace any element in array. This function can be use multi-dimensional array Or for single dimensional array.
Example of array_replace is given below:

pgsqltest=# select array_replace(ARRAY[ARRAY[1,2],ARRAY[3,4],ARRAY[56,6],ARRAY[7,8]],56,5);
(1 row)

2. VARIADIC-labeled arguments expansion for concat and format functions

This is new addition to concat and format function. pre-9.3 concat and format, function used to ignore VARIADIC label and doesn’t used to print right output. However, this has been fixed in 9.3. Lets look at the example of pre-9.3 and in 9.3

In pre-9.3

worktest=# select concat(variadic array[1,2,3]);
(1 row)

In 9.3

pgsqltest=# select concat(variadic array[1,2,3]);
(1 row)

You can see above in 9.3, using VARIADIC label working properly.

Lets look at the format() function:

In pre-9.3

worktest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
ERROR:  too few arguments for format

woow, its error out. However, this is fixed in 9.3 :-).

In 9.3

pgsqltest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
(1 row)

3. Improvement in format() function to handle field width and left/right alignment

This new added in 9.3 format() function, which is going to increase the usability of format() function for developers. In 9.3, format function is coming with following enhancements:
i). Proper handling of field width.
ii). Proper handling of Left/right alignment.

Above two improvement was missing in pre-9.3. However, addition of above improvement made format() function to be follower of sprintf() C function.

Lets look at the example:
In pre-9.3, using of field width resulted into error message, as given below

ERROR:  unterminated conversion specifier
worktest=#  select format('>>%10s<<', 'Hello');
ERROR:  unterminated conversion specifier

However, 9.3 is coming with proper field width support.

pgsqltest=#  select format('>>%10s<<', 'Hello');
 >>     Hello<<
(1 row)

9.3 is also coming proper handling left/right alignment. Examples are given below:
Left alignment

pgsqltest=# select format('>>%-10s<<', 'Hello');
 >>Hello     <<
(1 row)

Right alignment

pgsqltest=# select format('>>%1$10s<<', 'Hello');
 >>     Hello<<
(1 row)

4. Proper handling of Negative century in to_char, to_date and to_timestamp functions

In pre-9.3, following function behavior for negative century was wrong or inconsistent
a. to_char
b. to_date
c. to_timestamp.

However, 9.3 is coming with proper fix/handling for negative century. Lets look at the output in pre-9.3 and in 9.3 for each functions.


worktest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
 01-AUG-13 00:00:00 -04:56:02
(1 row)

Above you can see that its displaying wrong result for BC. In 9.3

pgsqltest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
 4713-08-01 00:00:00-04:56:02 BC
(1 row)

Lets see for to_date functions. In 9.2/pre-9.3

worktest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
ERROR:  full year must be between -4712 and +9999, and not be 0

In 9.3

pgsqltest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
 4713-08-01 BC
(1 row)

Above, you can see in 9.3,to_date function is working right. However to_date in 9.2 didn’t able to handle it. Similar behavior you can see for to_char function.

5. Improvement in pg_get_viewdef() to print new line after each SELECT and FROM entry

This is kind of enhancement made in pg_get_viewdef function of 9.3. And user will be able to see this enhancement in pg_dump plain dump too.

This enhancement is more like readability of output of view definition in 9.3 and reducing the line length of view definition. Lets look at how this improvement makes user experience of getting/viewing view definition much better.

Before 9.3 or in pre-9.3

worktest=# select pg_get_viewdef('pg_tables'::regclass);
 SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, 
t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS
 hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_
tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
(1 row)

Above you can see that pg_get_viewdef has printed defintion of view in one line, which not only make readability of view definition difficult. However also increase the line length.

Lets look at the pg_get_viewdef in 9.3

pgsqltest=# select pg_get_viewdef('pg_tables'::regclass);
  SELECT n.nspname AS schemaname,                            +
     c.relname AS tablename,                                 +
     pg_get_userbyid(c.relowner) AS tableowner,              +
     t.spcname AS tablespace,                                +
     c.relhasindex AS hasindexes,                            +
     c.relhasrules AS hasrules,                              +
     c.relhastriggers AS hastriggers                         +
    FROM ((pg_class c                                        +
    LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))  +
    LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))+
   WHERE (c.relkind = 'r'::"char");
(1 row)

which seems me better in terms of readability and doesn’t have long line.


pg_xlog_location_diff function for PostgreSQL/PPAS

In PostgreSQL 9.2, community has added a function pg_xlog_location_diff(), which is very useful for finding the difference between two xlog location in bytes and also useful for monitoring replication. Detail of this function is given in following link:

However this function is not available for users of PostgreSQL/PPAS 9.0/9.1 users. So, I thought to write same function plpgsql so, that users can take benefit of same in 9.0/9.1.

Before using formula and developing function, lets understand what is xlog and offset.
Let’s consider user has used function pg_current_xlog_location() function and he gets following information:

worktest=# select pg_current_xlog_location();
(1 row)

In above, first field before forward slash is the hexadecimal value of logical xlog file and second field i.e. D1012B80 is hexadecimal offset inside the logical xlogfile.

So, whenerver user sees information for xlog location, he gets xlog information in following format:

(hexadecimal) xlog/ (hexadecimal) offset

To calculate the difference between two xlog location, user can use following formula, which gives difference in hexadecimal:

(FF000000 * xlog + offset) - (FF000000 * xlog + offset)

where FF000000 is max value of offset i.e In PostgreSQL/PPAS offset value can go from 00000000 to FF000000 and if we convert that in decimal, it gives maximum 4278190080 bytes (4080MB)

Based on above formula, following is plpgsql function which can be use to get the difference:

CREATE OR REPLACE FUNCTION pg_xlog_location_diff_sql( text, text)
 RETURNS numeric
 LANGUAGE plpgsql
       offset1 text;
       offset2 text;
       xlog1 text;
       xlog2 text;
       SQL text;
       diff text;
       /* Extract the Offset and xlog from input in
          offset and xlog variables */
       /* Prepare SQL query for calculation based on following formula
         (FF000000 * xlog + offset) - (FF000000 * xlog + offset)
         which gives value in hexadecimal. Since, hexadecimal calculation is cumbersome
         so convert into decimal and then calculate the difference */
       SQL='SELECT (x'''||'FF000000'||'''::bigint * x'''||xlog1||'''::bigint 
                                +  x'''||offset1||'''::bigint)'||' 
                   (x'''||'FF000000'||'''::bigint * x'''||xlog2||'''::bigint 
                                +  x'''||offset2||'''::bigint)';
       EXECUTE SQL into diff;
       /* Return the value in numeric by explicit casting  */
       RETURN diff::numeric;

Usage example is given below:

worktest=# select pg_xlog_location_diff_sql(pg_current_xlog_location(),'1/D009F578');
(1 row)

I hope this will help PostgreSQL/PPAS 9.0/9.1 users.

Automating Binding of Servers with PEM Agent in Postgres Enterprise Manager 2.1.

This is second post of Postgres Enterprise Manager 2.1 series.

Question which people ask, Is Postgres Enterprise Manager tool useful for Companies, which provide services for PostgreSQL? Answer is yes. What about companies, which provide infrastructure support, where they provide server and PostgreSQL database, for them is this is a good tool for monitoring PostgreSQL? Answer is yes, you can use it.

Companies, which are giving Infrastructure support and providing Database As service always want everything to be automated. i.e with provisioning server, server should have installed PostgreSQL and its components plus they want automatic installation of pemagent, which is also acheivable. However, they stumped on Automatic Bidning of PEM Agent with PostgreSQL.

For binding PostgreSQL with PEM Agent, Simple Method is using PEM Client.
In PEM Client, Add a PostgreSQL in PEM Directory and then user can bind the PEM agent with PostgreSQL Cluster.

File -> Add Server -> which will popup screen for "New serve Registeration" -> After adding server information (Also click on Store PEM Server)

Then in “New server Registeration Window” ->

 Click on tab "PEM Agent" -> Choose agent from drop down window of "Bound Agent", Then user can include the information. 

Well above is one method. However for Company which provides infrastructure support doesn’t want to do this manual work. They want this activity to be automated.

For automating this process, user can do following:
1. Install the PEM agent in unattended mode. For detail of installing pemagent in unattended mode, user can use command like:

./ --prefix /opt/PEM --mode unattended --pghost <pem server hostname> --pguser <pem server username and password> --agent_description <Agent Hostname>

2. Create .pgpass file in root/admin user home directory. Like on linux root home directory is /root.

# cat .pgpass
*:5432:*:postgres:<your password>

Change the permission on .pgpass:

chmod 600 .pgpass

3. After creating the .pgpass file in home directory, execute following SQL:

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "
INSERT INTO pem.server(description,server,port,database) VALUES('New server','IP Address',5432,'postgres',2);" -U postgres

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "INSERT INTO pem.server_option(server_id,pem_user,username,server_group) VALUES((SELECT id FROM pem.server WHERE server='IP address' AND active=true),'postgres','postgres','PEM Server Directory');" -U postgres

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "INSERT INTO pem.agent_server_binding(agent_id,server_id,server,port,username,database)
VALUES((SELECT id FROM pem.agent WHERE description ='Agent Hostname' AND active=true),(SELECT id FROM pem.server WHERE server='IP address' AND active=true),5432,'postgres','postgres');" -U postgres

After executing above SQLs, you have successfully binded agent with Server for monitoring.

User can also include above steps in shell script and can execute while provisioning new server for their client.

Have Fun!

Partition Syntax Support in PPAS 9.1

In Series of New Features in Advanced Server 9.1, today I tought to write about Parition Table Syntax Supported in PPAS 9.1. In PostgreSQL and till PPAS 9.0, user has to follow method given below for partitioning a table:
2. Create Child tables using Inherit feature
3. Create Trigger on Parent table, so that trigger can re-direct insert to Right Partition.
4. And if user has to add new child table, then it has to do 2 and 3 steps again.

Now, in PPAS 9.0, user doesn’t have to perform above activities. PPAS 9.1 supports PARTITION TABLE syntax.

Lets see how PARTITION syntax in PPAS can make users/DBAs life easier.

We know that PPAS/PostgreSQL supports two types of partition (Range and List). So, we will see how its simple with CREATE PARTITION SYNTAX.
Lets CREATE RANGE PARTITION as we used to do in PPAS 9.0/PostgreSQL
1. Create Master table as given below:

CREATE TABLE partition_master(id numeric primary key,val text);
CREATE TABLE partition_child1(CHECK (id > 0 and id <=10)) inherits(partition_master);
CREATE TABLE partition_child2(CHECK (id >10 and id <=20)) inherits(partition_master);

2. Then create check constraints on Both Child tables

alter table partition_child1 add primary key (id);
alter table partition_child2 add primary key (id);

3. Create Trigger, which redirect Inserts to correct child table:

CREATE OR REPLACE FUNCTION part_trig_insrt() RETURNS trigger
      IF >0 and <=10 THEN
           INSERT INTO partition_child1 VALUES(NEW.*);
      ELSIF >10 and <=20 THEN
           INSERT INTO partition_child2 VALUES(NEW.*);
          RAISE 'inserted partition key doesnt map to any partition';      
     END IF;
$$ language plpgsql;
CREATE TRIGGER partition_trig_insert BEFORE INSERT ON partition_master FOR EACH ROW execute procedure part_trig_insrt();

similarly, you have to write trigger which can handle partition key update.

And whenever user wants to add new partition, he has to update the trigger function and create new partition table etc.

Now, in 9.1AS on-wards, user can run single command for partition and PPAS will take care of all things, as given below:

CREATE TABLE partition_master (id numeric primary key,val text)
(PARTITION partition_child1 VALUES LESS THAN (11),
PARTITION partition_child2 VALUES LESS THAN (21));

edb=# insert into partition_master values(1,'First');
edb=# insert into partition_master values(11,'Eleventh');
edb=# select * from partition_master;
 id |   val    
  1 | First
 11 | Eleventh
(2 rows)

edb=# select * from only partition_master_partition_child1;
 id |  val  
  1 | First
(1 row)

edb=# select * from partition_master_partition_child2;
 id |   val    
 11 | Eleventh
(1 row)

That was easy. With single command you can have your partition table in PPAS.

Suppose later, user wants to add one more partition then he can execute following single command:

ALTER TABLE partition_master add partition partition_child3 VALUES LESS THAN (31);

edb=# insert into partition_master values(30,'Thirty');
edb=# select * from partition_master;
 id |   val    
  1 | First
 11 | Eleventh
 30 | Thirty
(3 rows)

edb=# select * from partition_master_partition_child3;
 id |  val   
 30 | Thirty
(1 row)

Thats simple. isn’t?

With this PPAS 9.1 Partition Syntax also allows swaping an existing table with a partition or subpartition, as given below:

CREATE TABLE single_table (id numeric primary key,val text)
insert into single_master select generate_series(1,10);

ALTER TABLE partition_master 
EXCHANGE PARTITION partition_child1
WITH TABLE single_table;

Other syntax which are supported is given below:


Support of above partition syntaxes have really made management of partition table easier in PPAS 9.1!

New Features in Slony 2.1

Slony 2.1 has been released on 19th Oct. 2011. So, I thought to look at some important improvement done in this new release, which can make users life easier.

Before discussing about the changes, lets setup slony replication.

Following Codes can be use for setting up slony replication.

####   Preable Scripts:

cluster name=slonytest;
NODE 1 ADMIN CONNINFO = 'dbname=postgres host=localhost user=postgres port=5432 password=postgres';
NODE 2 ADMIN CONNINFO = 'dbname=repdb host=localhost user=postgres port=5432 password=postgres';

Adding Node script:

### create_nodes.slonik

include <preamble.slonik>;
init cluster (id=1, comment='slonytest node 1');
store node (id=2, comment='slonytest subscriber node 2', event node=1);

Storing Path:

### store_paths.slonik

include <preamble.slonik>;
STORE PATH (SERVER=1, CLIENT=2, CONNINFO='dbname=postgres host=localhost user=postgres port=5432 password=password');
STORE PATH (SERVER=2, CLIENT=1, CONNINFO='dbname=repdb host=localhost user=postgres port=5432 password=passsword');

Master Slon Process conf file:

## Master Slon Conf:

cleanup_interval="10 minutes"
conn_info='dbname=postgres host=localhost user=postgres port=5432 password=postgres'
sql_on_connection="SET log_min_duration_statement TO '1000';"

Slave Slon process conf file:

## Slave Slon Conf:
cleanup_interval="10 minutes"
conn_info='dbname=repdb host=localhost user=postgres port=5432 password=Empid#042'
sql_on_connection="SET log_min_duration_statement TO '1000';"

Ater creating above configuration, user can setup Slony Replication.

Now, lets look at the features added in slony 2.1

1. Support for adding tables in bulk

In Previous Version of Slony, if user has to add tables of a particular schema, then he has to write slonik command as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table (id=1, set id=1, origin=1, fully qualified name='public.dept', comment='slonytest table public.dept');
set add table (id=2, set id=1, origin=1, fully qualified name='public.emp', comment='slonytest table public.emp');
set add table (id=3, set id=1, origin=1, fully qualified name='public.employees', comment='slonytest table public.employees');

Which was a bit work, where user has to list all the tables and then add into slonik command Or user has to write a script which can generate slonik command.

Now, in slony 2.1, user can write single command to add all tables of a schema in slony replication as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');

Snapshot of activity is given below:

edbs-MacBook-Pro:slony vibhor$ cat create_set.slonik
include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');
edbs-MacBook-Pro:slony vibhor$
edbs-MacBook-Pro:slony vibhor$ slonik < create_set.slonik

User can verify the added tables in bulk using following command:

postgres=# select * from _slonytest.sl_table;
 tab_id | tab_reloid | tab_relname | tab_nspname | tab_set |   tab_idxname   | tab_altered |   tab_comment
      1 |      16438 | dept        | public      |       1 | dept_pk         | f           | replicated table
      2 |      16445 | emp         | public      |       1 | emp_pk          | f           | replicated table
      3 |      16995 | employees   | public      |       1 | employees_pkey  | f           | replicated table
      4 |      16987 | entities    | public      |       1 | entities_pkey   | f           | replicated table
      5 |      16456 | jobhist     | public      |       1 | jobhist_pk      | f           | replicated table
      6 |      16967 | person_job  | public      |       1 | person_job_pkey | f           | replicated table
(6 rows)

If user wants to add all sequences of a particular schema in Slony Replication, then he can try something like given below:

include <preamble.slonik>;
set add sequences(set id=1, tables='public.*seq'); ## Adding all sequences of Public schema

User can also use some patterns while adding schema as given below:

include <preamble.slonik>;
set add tables(set id=1, tables='public.slonytest_[1234]',add sequences=true); ## adding tables: slonytest_1, slonytest_2, slonytest_3

2. Implicit WAIT FOR

In Previous version of Slony, user has to use WAIT FOR Clause to before executing command like MergeSet/MoveSet

     MERGE SET ( ID = 1, ADD ID = 999, ORIGIN = 1 );

Now, in 2.1, if user forgets WAIT FOR clause, then slony will use implicit WAIT FOR whenever it requires. However, user has to keep following in mind:
a. User should not run multiple slonik script parallely. Since, running parallel slonik script may confuse Slonu implicit WAIT FOR behavior.
b. Slonik Command will use WAIT FOR if event node changes.
c. Slonik Command will wait before executing SUBSCRIVE SET/DROP NODE/CLONE NODE Commands, till primary caught up with other slave nodes.
d. Slonik Command CREATE SET will wait, untill all outstanding DROP SET confirmed by master and slaves in a Slony Replication cluster.
e. WAIT FOR command will not work in TRY block. — (Incompatibilities of other version)

3. Support for replicating TRUNCATE
Slony 2.1 has introduced feature of Replication TRUNCATE Command on slaves. TRUNCATE replication is possible for PostgreSQL version >=8.4.
When user subscibe a table using slony 2.1, then slony creates a _slonytest_truncatetrigger trigger on master to replicate those truncate command.
Lets see its work:

edbs-MacBook-Pro:~ vibhor$ psql -c "truncate table person_job" postgres
edbs-MacBook-Pro:~ vibhor$ psql -c "select * from person_job" repdb
 pid | job
(0 rows)

4. Health checks at startup
slony 2.1, now checks the status of Each node during startup and give proper message to fix the issue. In health check it checks the configuration information and returns OK, if there is no problem. Pl/pgSQL function which has been added for health check is given below:

CREATE OR REPLACE FUNCTION _slonytest.slon_node_health_check()
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
		prec record;
		all_ok boolean;
		all_ok := 't'::boolean;
		-- validate that all tables in sl_table have:
		--      sl_table agreeing with pg_class
		for prec in select tab_id, tab_relname, tab_nspname from
		"_slonytest".sl_table t where not exists (select 1 from pg_catalog.pg_class c, pg_catalog.pg_namespace n
				where c.oid = t.tab_reloid and c.relname = t.tab_relname and c.relnamespace = n.oid and n.nspname = t.tab_nspname) loop
				all_ok := 'f'::boolean;
				raise warning 'table [id,nsp,name]=[%,%,%] - sl_table does not match pg_class/pg_namespace', prec.tab_id, prec.tab_relname, prec.tab_nspname;
		end loop;
		if not all_ok then
		   raise warning 'Mismatch found between sl_table and pg_class.  Slonik command REPAIR CONFIG may be useful to rectify this.';
		end if;
		return all_ok;

5. Performance improvement in cases of large backlog
Slony 2.1 came with performance improvement in case of large replication backlog. In previous versions, user used to experience performance impact when sl_log_* tables grows and replication of data to subscriber takes so long. Due to which Slony Replication takes huge time to catch up. Main cause of this performance issue was: Slony used to do sequential scan while pulling data from sl_log_* tables. Now, in 2.1, queries which pulls data from sl_log_* have been modified and now it uses index scan.

6. Monitoring thread to provide better monitoring data
2.1, also came with one monitoring table sl_components. This table is very useful in monitoring the Slony Threads. Description/detail of sl_components can be found in following link:

Snapshot of informantion display by sl_components is given below:

postgres=# select * from sl_components;
       co_actor       | co_pid | co_node | co_connection_pid |   co_activity    |       co_starttime        |  co_event  | co_eventtype
 local_listen         |  26688 |       1 |             26692 | thread main loop | 2011-11-03 12:07:56+05:30 |            | n/a
 local_sync           |  26688 |       0 |             26697 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remote listener      |  26688 |       2 |             26695 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remoteWorkerThread_2 |  26688 |       2 |             26694 | SYNC             | 2011-11-03 12:08:00+05:30 | 5000004917 | SYNC
 local_monitor        |  26688 |       0 |             26698 | thread main loop | 2011-11-03 00:32:19+05:30 |            | n/a
 local_cleanup        |  26688 |       0 |             26696 | cleanupEvent     | 2011-11-03 12:00:15+05:30 |            | n/a
(6 rows)

PL/Perl Getting Data From External File in PostgreSQL/PPAS

Till PostgreSQL 9.0, PostgreSQL doesn’t have any feature which can be use to read the external file (a concept of External Table).

External Table is a feature of Database using which, any one can read the flat files as if it were in a table in the database.

So, I thought give an idea how user can implement this in Database using plperl.

Let’s see how you can implement it.

I have a flat file, which has following data:

Location: /tmp/


Since, I know this flat file is having two field (numeric and text), so,I can write a plperl function, which can read the file and can get the attribute(column values) of a line on the basis of passed delimiter, with this if there is any bad record, that will go to the bad file for verification if required.

Following is a sample plperl function:

CREATE OR REPLACE FUNCTION read_external(text,text) returns table(id numeric, col1  text)
## Written by Vibhor to Read external file
my $badfile = ">>"."$_[0]".".bad";
open (BADFILE,$badfile);
open (DATFILE, $_[0]);
while ($line=<datfile>) {

if ($line[0] =~ /^[0-9]+$/ && $line[0] !="")
    return_next({id => $line[0], col1 => $line[1]}); }
else {
  print BADFILE "$line\n";
return undef;
$$ language plperlu;

And I have made a view which can be use to SELECT the records in flat file as a table. Following is a result

postgres=# create or replace view external_table as select * from read_external('/tmp/',',');
postgres=# select * from external_table;
 id |  col1
  1 | Check1
  2 | Check2
  3 | check3
  4 | check4
(4 rows)
And I know, there is a bad record: 5a,check5, which I can find in bad file:

postgres=# \! cat /tmp/

PostgreSQL/Postgres Plus advanced Server Process Details

Following are some of the important Process of PostgreSQL/Postgres Plus Advanced Server.

vibhore@ubuntu:~/edb/edb-postgres$ ps -eaf|grep postgre
1002      1080     1  0 06:07 ?        00:00:00 /opt/PostgresPlus/8.4AS/bin/edb-postgres -D /opt/PostgresPlus/8.4AS/data (Mandatory)
1002      1101  1080  0 06:07 ?        00:00:00 postgres: logger process   (Mandatory)
1002      1103  1080  0 06:07 ?        00:00:01 postgres: writer process                                               (Mandatory)
1002      1104  1080  0 06:07 ?        00:00:01 postgres: wal writer process (Mandatory)
1002      1105  1080  0 06:07 ?        00:00:00 postgres: autovacuum launcher process  (Optional if autovacuum is on)
1002      1106  1080  0 06:07 ?        00:00:00 postgres: stats collector process      (Mandatory)

Above is an snapshot of the processes which exist in PostgreSQL/Postgres Plus Advanced Server.
Postmaster: is a superior process among others its the process which keep monitor the other process and also spawns new process if user request comes to it. It is the process which is responsible for Host Based Authentication and User Based Authentication. This process also checks the consistency of database and accordingly does the recovery of PostgreSQL Instance.

Logger Process: This process is responsible for logging the details of activity from the startup to shutdown of the PostgreSQL Instance. Logging behavior of this process can be control by following catagories of parameters:
Category: Where to Log: Parameters:
1. log_destination (string): This parameter tells logger process about methods for logging server messages, including stderr, csvlog and syslog.
2. logging_collector (boolean): This parameter allows messages sent to stderr, and CSV-format log output, to be captured and redirected into log files.
3. log_directory (string): This parameter specifies directory of logfile.
4. log_filename (string): When logging_collector is enabled, this parameter sets the file names of the created log files. The value is treated as a strftime pattern, so %-escapes can be used to specify time-varying file names.
5. log_rotation_age (integer): This parameter determines the maximum lifetime of an individual log file.After this many minutes have elapsed, a new log file will be created. Set to zero to disable time-based creation of new log files. This parameter can only be set in the postgresql.conf file or on the server command line.
6. log_rotation_size (integer): this parameter determines the maximum size of an individual log file. After this many kilobytes have been emitted into a log file, a new log file will be created.
7. log_truncate_on_rotation (boolean): This parameter will cause PostgreSQL to truncate (overwrite), rather than append to, any existing log file of the same name. However, truncation will occur only when a new file is being opened due to time-based rotation, not during server startup or size-based rotation.
8. syslog_facility (enum): This parameter determines the syslog “facility” to be used. You can choose from LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7; the default is LOCAL0. See also the documentation of your system’s syslog daemon.
9. syslog_ident (string): This parameter determines the program name used to identify PostgreSQL messages in syslog logs. The default is postgres.
10. silent_mode (boolean): This parameter is set, the server will automatically run in background and disassociate from the controlling terminal. This parameter can only be set at server start.

Category: When to Log: Parameters:
Following are parameters which controls the message level:
1. client_min_messages (enum)
2. log_min_messages (enum)
3. log_error_verbosity (enum)
4. log_min_error_statement (enum)
5. log_min_duration_statement (integer)
Detail about the above parameter values can be found in following link:

Category: What to Log: Parameters:
1. Debug Parameters:
debug_print_parse (boolean)
debug_print_rewritten (boolean)
debug_print_plan (boolean)
These parameters enable various debugging output to be emitted. When set, they print the resulting parse tree, the query rewriter output, or the execution plan for each executed query.
2. debug_pretty_print (boolean): This indents the messages produced by debug_print_parse, debug_print_rewritten, or debug_print_plan. This results in more readable but much longer output than the “compact” format used when it is off.
3. log_checkpoints (boolean): This checkpoints to be logged in the server log. Some statistics about each checkpoint are included in the log messages, including the number of buffers written and the time spent writing them.
4. log_connections (boolean): This parameter causes each attempted connection to the server to be logged, as well as successful completion of client authentication.
5. log_disconnections (boolean): This outputs a line in the server log similar to log_connections but at session termination, and includes the duration of the session
6. log_duration (boolean): This parameter causes the duration of every completed statement to be logged.
7. log_lock_waits (boolean): It controls whether a log message is produced when a session waits longer than deadlock_timeout to acquire a lock.
8. log_statement (enum): It controls which SQL statements are logged. Valid values are none, ddl, mod, and all. ddl logs all data definition statements, such as CREATE, ALTER, and DROP statements. mod logs all ddl statements, plus data-modifying statements such as INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE, EXECUTE, and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type.
9. log_temp_files (integer): It controls logging of use of temporary files. Temporary files can be created for sorts, hashes, and temporary query results. A log entry is made for each temporary file when it is deleted.
10. log_timezone (string): It sets the time zone used for timestamps written in the log.Unlike timezone, this value is cluster-wide, so that all sessions will report timestamps consistently.

writer process: This is a BG (Back Ground) Writer process. This process writes dirty shared buffer to Disk on following events:
1. When Checkpoint occurs.
2. When Checkpoint timeout happens
3. When WAL Writter finishes the number segments mention in Checkpoint_segments.
4. When there is no space left in shared buffer for new Blocks.

Parameters which can help in controlling the activity of BG Writter activities if given below:
1. bgwriter_delay (integer): This parameter specifies delay between activity rounds for the background writer. In each round the writer issues writes for some number of dirty buffers (controllable by the following parameters). It then sleeps for bgwriter_delay milliseconds, and repeats.
2. bgwriter_lru_maxpages (integer): This parameter specifies max number of buffers will be written to disk in each round. Setting this to zero disables background writing (except for checkpoint activity).
3. bgwriter_lru_multiplier (floating point): This parameter use in estimating of the number of buffers that will be needed during the next round. Formula which use is given below:
(average recent need) * bgwriter_lru_multiplier
Dirty buffers are written until there are ((average recent need) * bgwriter_lru_multiplier) many clean, reusable buffers available.

wal writer process: This process writes all the transaction from WAL Buffer to WAL files in pg_xlog. Parameters which can control the WAL Writter is given below. It is similar to log writer in Oracle (which flush redo log buffer to redo logfile). Parameter which can be use to control the behavior of WAL Writer can be found in following link:

stats collector process:
The collector can count accesses to tables and indexes in both disk-block and individual-row terms. It also tracks the total number of rows in each table, and information about vacuum and analyze actions for each table. It can also count calls to user-defined functions and the total time spent in each one.
Information of paramter which can be use to control its activity can be find in following link:

autovacuum process:
Autovacuum is a process, which performs following activity:
1. To recover or reuse disk space occupied by updated or deleted rows.
2. To update data statistics used by the PostgreSQL query planner.
3. To protect against loss of very old data due to transaction ID wraparound.

Till 8.2, default value of this parameter is off. From 8.3 onwards, its value is on, which means now, its default process from 8.3 Onwards.