Exploring MULTISET INTERSECT and MULTISET EXCEPT in EDB Postgres Advanced Server 16: Enhancing Data Handling Capabilities

November 21, 2023 Database Technologies & SecurityLeave a comment

The latest iteration of EDB Postgres Advanced Server, version 16, has introduced some exciting features, significantly enhancing its capabilities. Among these new additions, the MULTISET INTERSECT and MULTISET EXCEPT operators stand out for their unique functionality and utility. This blog post will explore what these features are, their importance, and provide practical examples with working code to showcase their applications.

What Are MULTISET INTERSECT and MULTISET EXCEPT?

MULTISET INTERSECT and MULTISET EXCEPT are set operations that operate on nested tables – collections of rows treated as a single unit. These operators enable users to perform more complex set operations than those allowed by traditional SQL set operators.

MULTISET INTERSECT: This operator returns a nested table containing every element that is in both of the operand nested tables. It’s akin to finding a common ground between two sets of data.
MULTISET EXCEPT: Conversely, this operator returns a nested table that consists of elements from the first operand nested table, excluding those found in the second one. It effectively subtracts one set from another.

The Importance of These Features

These operators are particularly important in scenarios where relational data needs to be manipulated in a set-oriented manner. This is especially true for complex data processing tasks, such as those found in data analytics, data warehousing, and business intelligence applications. They provide:

Enhanced Data Manipulation: Allow for more sophisticated and nuanced set operations directly in SQL.
Simplified Queries: Reduce the complexity of queries that would otherwise require multiple steps and temporary storage.
Improved Performance: Potentially more efficient execution as compared to manually coded set manipulations.
Simplified Oracle Migration: Reduces the effort of implementing workarounds for this feature during migration from Oracle to Postgres.

Practical Examples with Working Code

Let’s explore some practical scenarios where MULTISET INTERSECT and MULTISET EXCEPT can be used, along with sample SQL code to demonstrate their application.

Example 1: Using MULTISET INTERSECT

Imagine we have two nested tables representing the course lists of two students, and we want to find the courses common to both students.

DECLARE
  TYPE CourseList IS TABLE OF VARCHAR2(100);
  student1_courses CourseList := CourseList('Math', 'Science', 'History');
  student2_courses CourseList := CourseList('Math', 'English', 'History');
  common_courses CourseList;
BEGIN
  common_courses := student1_courses MULTISET INTERSECT student2_courses;
  DBMS_OUTPUT.PUT_LINE('Common Courses: ' || common_courses.COUNT);
  FOR i IN 1..common_courses.COUNT LOOP
    DBMS_OUTPUT.PUT_LINE(common_courses(i));
  END LOOP;
END;

Output:

Common Courses: 2
Math
History

EDB-SPL Procedure successfully completed
edb=#

Example 2: Using MULTISET EXCEPT

Now, let’s say we want to find out which courses student1 is taking that student2 is not.

DECLARE
  TYPE CourseList IS TABLE OF VARCHAR2(100);
  student1_courses CourseList := CourseList('Math', 'Science', 'History');
  student2_courses CourseList := CourseList('Math', 'English', 'History');
  unique_courses CourseList;
BEGIN
  unique_courses := student1_courses MULTISET EXCEPT student2_courses;
  DBMS_OUTPUT.PUT_LINE('Courses unique to Student 1: ' || unique_courses.COUNT);
  FOR i IN 1..unique_courses.COUNT LOOP
    DBMS_OUTPUT.PUT_LINE(unique_courses(i));
  END LOOP;
END;

Output:

Courses unique to Student 1: 1
Science

EDB-SPL Procedure successfully completed

Conclusion

The introduction of MULTISET INTERSECT and MULTISET EXCEPT in EDB Postgres Advanced Server 16 is a great feature for database professionals. These operators not only enhance the power and flexibility of SQL in handling complex data sets but also streamline processes and potentially improve query performance. As databases continue to evolve, embracing these advanced features will be key in effectively managing and extracting value from complex data sets. Whether you are working in data analysis, reporting, or any field that requires sophisticated data manipulation, these new set operators in EDB Postgres Advanced Server 16 are invaluable tools in your database arsenal.

Postgres and Transparent Data Encryption (TDE)

August 6, 2015January 25, 2018 Database Technologies & SecurityLeave a comment

Security has always been a great concern of Enterprises. Especially, if you have crucial information stored in the database, you would always prefer to have high security around it. Over the years, technologies have evolved and provided better solutions around it.

If you have very sensitive information, people try to keep this information encrypted so, that in case, somebody gets access of the system, then they cannot view this information, if they are not authorized.

For managing sensitive information, Enterprises use multiple methods:

Encrypting specific information.

If you are PPAS users, you would like to use DBMS_CRYPTO package which provides a way of encrypting sensitive information in databases.

For more information, please refer following link:

http://www.enterprisedb.com/docs/en/9.4/oracompat/Database_Compatibility_for_Oracle_Developers_Guide.1.178.html#

For PostgreSQL, users can use pgcrypto module.

Transparent Data Encryption (TDE) is another method employed by both Microsoft and Oracle to encrypt database files. TDE offers encryption at file level. This method solves the problem of protecting data at rest i.e. encrypting databases both on the hard drive and consequently on backup media. Enterprises typically employ TDE to solve compliance issues such as PCI DSS.

Postgres Plus, currently doesn’t have inbuilt TDE, however, if Enterprises looking for encryption at the database file level, they can use one of the following methods for protecting data at rest:

Full Disk Encryption:

Full disk or partition encryption is one of the best ways of protecting your data. This method not only protects each file, however, also protects the temporary storage that may contain parts of these files. Full disk encryption protects all of your files and then you do not have to worry about selecting what you want to protect and possibly missing a file.

RHEL (Red Hat) supports Linux Unified Key Setup-on-disk-format (or LUKS). LUKS bulk encrypts Hard Drive partition.

For more information on LUKS, please refer following link:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security_Guide/chap-Security_Guide-Encryption.html#sect-Security_Guide-LUKS_Disk_Encryption

File system-level encryption:

File system-level encryption often called file/directory encryption. In this method individual files or directories are encrypted by the file system itself.

There is stackable cryptographic file system encryption available which user can utilize in their environment.

File system level Encryption gives following advantages:

Flexible file-based key management, so that each file can be and usually is encrypted with a separate encryption key.

Individual management of encrypted files e.g. Incremental backups of the individual changed files even in encrypted form, rather than backup of the entire encrypted volume.

Access control can be enforced through the use of public-key cryptography, and the fact that cryptographic keys are only held in memory while the file that is decrypted by them is held open.

Stackable cryptographic file system encryption can be use for Postgres for Transparent Data Encryption.

In this blog, I will discuss using mount ecrpytfs as it requires less overhead in setup (LUKS requires a new disk to be configured and formatted before storing data on it. “mount ecrpytfs” works with existing directories and data).

If Enterprises want to give the control to DBAs for TDE, they can use/define few sudo rules for DBAs to execute commands for encryption.

Following is a method, which they can use:

Ask system admin to create sudo rules to allow DBA to execute encryption for data directory for Postgres Plus. One common way to do this is using the “mount ecryptfs” command in Linux operating systems.
If user needs to encrypt the /ppas94/data directory, they can use following command:

sudo mount -t ecryptfs /ppas94/data /ppas94/data

More information can be found in the documentation from RHEL:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-efs.html

User can also specify encryption key type (passphrase, openssl), cipher (aes, des3_ede…) key byte size, and other options with above commands.

Example is given below:

# mount -t ecryptfs /home /home -o ecryptfs_unlink_sigs \

 ecryptfs_key_bytes=16 ecryptfs_cipher=aes ecryptfs_sig=c7fed37c0a341e19
<strong>

Centos 7 and RHEL 7, by default doesn’t come with ecrpytfs therefore, users can also use encfs command.

For more information on encfs, please refer following link:

https://wiki.archlinux.org/index.php/EncFS

Following are the steps to use encfs to encrypt the data directory.

Create a data directory using the following command, as enterprisedb user.

   mkdir /var/lib/ppas/9.4/encrypted_data
   chmod 700 /var/lib/ppas/9.4/encrypted_data

Use following encfs command to encrypt the data directory.

         encfs /var/lib/ppas-9.4/encrypted_data/ /var/lib/ppas-9.4/data

Snapshot of above command is given below:

encfs /var/lib/ppas/9.4/encrypted_data /var/lib/ppas/9.4/data

The directory "/var/lib/ppas/9.4/data" does not exist. Should it be created? (y,n) y</pre>
Creating new encrypted volume.

Please choose from one of the following options:

enter "x" for expert configuration mode,

enter "p" for pre-configured paranoia mode,

anything else, or an empty line will select standard mode.

?>

Standard configuration selected.

Configuration finished.  The filesystem to be created has

the following properties:

Filesystem cipher: "ssl/aes", version 3:0:2

Filename encoding: "nameio/block", version 4:0:2

Key Size: 192 bits

Block Size: 1024 bytes

Each file contains 8 byte header with unique IV data.

Filenames encoded using IV chaining mode.

File holes passed through to ciphertext.

Now you will need to enter a password for your filesystem.

You will need to remember this password, as there is absolutely

no recovery mechanism.  However, the password can be changed

later using encfsctl.

New Encfs Password: 

Verify Encfs Password:

After encrypting, data directory, users also need to modify the postgresql- service script to include proper command in it for password. For that either, they can use sshpass or they can write their own program which can pass the password for mounting directory.

As you can see, achieving Transparent Data Encryption Postgres is very easy.

Meet BART – A New Tool for Backup And Recovery Management

November 18, 2014November 18, 2014 Database Technologies & SecurityLeave a comment

EnterpriseDB recently launched a new tool for backup and recovery – named simply EDB Backup and Recovery Tool, or BART. This tool makes the DBA’s life easier by simplifying the tasks for managing their Postgres physical backup and recovery tasks, whether they are PostgreSQL or Postgres Plus Advanced Server deployments.

BART has the following advantages over custom scripts for managing backups:

1. It’s stable and it uses the tool pg_basebackup to take a physical backup. This tool has been well defined and is well-supported by the PostgreSQL community.

2. It catalogs all of the backups users are taking, which is important in terms of:
i. Listing the type of backups used
ii. Listing the status of those backups with server information.

3. BART also provides functionality to restore backups, with all required archived WAL files. So automation around this tool will make DBAs’ lives easier for restore and recovery.

4. BART provides an option to validate your backup by using checksum. This is useful for confirming you took a valid backup and it is not corrupted at disk level.

5. BART provides an option to define your retention policy around the backups you are keeping.

Given all of the above advantages, I decided to give this new tool a try and share some tips. To get started, you need the following prerequisites:

1. BART currently requires a Linux 64 bit platform, CentOS 6.x or RHEL 6.x
2. Need to have password-less, direct SSH access to the target machine where you want to restore backups as well as the database servers you want backed up
3. Install the Postgres Plus Advanced Server or PostgreSQL binaries for pg_basebackup

Yum or rpm

To install this tool, you have two options that I will explore below:

1. Yum command
2. Rpm command.

Using the yum command:

To perform a yum command installation, BART users can ask EDB for credentials to the EnterpriseDB yum repository and configure the their local yum repository as follows:

echo &quot;[tools]
name=EnterpriseDB Tools
baseurl=http://username:password@yum.enterprisedb.com/tools/redhat/rhel-$releasever-$basearch
enabled=1
gpgcheck=0&quot; &gt; /etc/yum.repos.d/edbtools.repo

After creating the yum repo, the user can execute the following command to install BART:

 yum install edb-bart

If the user doesn’t want to install the EDB Backup and Recovery Tool using the yum command, then the user can download a free standing rpm using the link below from EDB’s website:

http://www.enterprisedb.com/downloads/postgres-postgresql-downloads

and then enter the rpm install command as follows:

rpm -ivh edb-bart-1.0.1-1.rhel6.x86_64.rpm

After installing BART using the above commands, the user can see the binaries in the directory:/usr/edb-bart-1.0/bin and a sample BART configuration file in /usr/edb-bart-1.0/etc

That’s a very easy installation.

For more information on configuring BART Host and Database Host, the following are some documents that will help:
1. pg_basebackup configuration for PostgreSQL:
http://www.postgresql.org/docs/current/static/app-pgbasebackup.html

2. For direct password less ssh configuration user can refer following link
http://www.enterprisedb.com/docs/en/1.0/bart/EDB_Backup_and_Recovery_Tool_Guide-17.htm#P1008_76316

After the installation of the BART binaries, the user also has to create a BART configuration file.

The following is a sample configuration file for BART:

[BART]
bart-host= enterprisedb@127.0.0.1
backup_path = /opt/backup
pg_basebackup_path = /usr/ppas-9.4/bin/pg_basebackup
logfile = /tmp/bart.log

[PG]
host = 127.0.0.1
port = 5432
user = postgres
description = &quot;Postgres server&quot;

[PPAS94]
host = 127.0.0.1
port = 5444
user = enterprisedb
description = &quot;PPAS 94 server&quot;

Global Configuration Settings

Content under the [BART] tag are called global configuration settings. Under this tag are the following:

1. bart-host: the IP address of the host on which BART is installed. The value for this parameter must be specified in the form: bart_user@bart_host_address, where bart_user is the operating system user account on the BART host that is used to run BART and owns the BART backup catalog directory. bart_host_address is the IP address of the BART host.

2. backup_path: specifies the file system parent directory where all BART database server base backups and archived WAL files are stored. This parameter is required.

3. pg_basebackup_path: specifies the path to the pg_basebackup program of the Postgres database server installed on the BART host.

4. log file: specifies the path to the BART log file. This parameter is optional. If no path to a log file is specified after logfile =, or if the parameter is commented out, BART does not create a log file.

The remaining part of configuration file is self-explanatory. The TAG: [PG]/[PPAS94] part is content for servers which the user wants to back up.

Pg_basebackup Settings

After performing the above configuration on the Backup Server, the user has to do set following settings on the servers that they want to back up. Below are the settings for enabling backup using pg_basebackup.

The user has to set a few parameters in PostgreSQL postgresql.conf file, which he wants to backup:

1. wal_level parameter to archive or hot_standby.
2. archive_mode=on
3. archive_command setting.
4. max_wal_senders to 1 or more than one, since pg_basebackup uses the replication protocol to copy data directory.

For more information on each setting please refer to the following:
1. wal_level:
http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html

2. archive_mode and archive_command:
http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING

3. max_wal_senders:
http://www.postgresql.org/docs/9.4/static/runtime-config-replication.html
http://www.enterprisedb.com/docs/en/1.0/bart/EDB_Backup_and_Recovery_Tool_Guide-19.htm#TopOfPage

With the above settings, the user then needs to update the pg_hba.conf file for the replication connection.

Note: The above settings are for pg_basebackup to take backups using replication protocols. In case users need more information about pg_basebackup and settings, please use the above mentioned link

How BART Works

Now, since we have configured both servers, let’s have a look how BART works.

The following command executes a backup:

 bart -c bart.cfg BACKUP -s ppas94

And below is the output:

[bart@localhost ~]$ bart -c bart.cfg BACKUP -s ppas94

INFO:  creating backup for server 'ppas94'
INFO:  backup identifier: '1413852137762'
6394456/6394456 kB (100%), 1/1 tablespace

INFO:  backup checksum: 7f49ea9653511308710c174f22ec765d
INFO:  backup completed successfully
[bart@localhost ~]$

That was an easy way to take a backup. The DBA can also create a job to execute the above command to take backups.

If the user wants to list the backup using BART, the user can use the option SHOW-BACKUPS:

[bart@localhost ~]$ bart -c bart.cfg SHOW-BACKUPS -s ppas94
 Server Name   Backup ID       Backup Time           Backup Size  
                                                                  
 ppas94        1413852137762   2014-10-20 17:43:41   6244.59 MB

This is useful for knowing what backups a user has available for recovery. The above command gives important information:

1.	Backup ID: It’s a unique ID for the physical backup
2.	Backup Time: Time when backup was taken
3.	Backup Size: Size of backup

This information is useful when a user wants to plan for recovery using backup. This way, the user can also plan for disk size.

Sometimes a user wants to verify their backup state. VERIFY-CHKSUM option is useful in this case:

[bart@localhost ~]$ bart -c bart.cfg VERIFY-CHKSUM -s ppas94 -i 1413852137762
 Server Name   Backup ID       Verify  
                                       
 ppas94        1413852137762   OK

I have to say, after putting EDB BART through its paces, I think DBAs will enjoy having such a great tool for making Backup Management easy.

In my next post, I will blog about the Recovery process.

Monitoring approach for Streaming Replication with Hot Standby in PostgreSQL 9.3.

May 21, 2014 Database Technologies & Security6 Comments

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

   pid:              Process id of walsender process
   usesysid:         OID of user which is used for Streaming replication.
   usename:          Name of user which is used for Streaming replication
   application_name: Application name connected to master
   client_addr:      Address of standby/streaming replication
   client_hostname:  Hostname of standby.
   client_port:      TCP port number on which standby communicating with WAL sender
   backend_start:    Start time when SR connected to Master.
   state:            Current WAL sender state i.e streaming
   sent_location:    Last transaction location sent to standby.
   write_location:   Last transaction written on disk at standby
   flush_location:   Last transaction flush on disk at standby.
   replay_location:  Last transaction flush on disk at standby.
   sync_priority:    Priority of standby server being chosen as synchronous standby
   sync_state:       Sync State of standby (is it async or synchronous).

e.g.:

postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.

postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 0/29004560
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

postgres=# select pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.

i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:

pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

   SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
                 THEN 0
               ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
          END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

New in PostgreSQL 9.3: New in Functions

August 4, 2013 Database Technologies & Security6 Comments

In the series of blogging about new features in 9.3, today, I thought about blogging new functions and improvements coming in PostgreSQL.

Lets look whats new in 9.3, in terms of in build functions:

1. New in one array functions for one dimensional array

PostgreSQL 9.3, is coming with new functions which can help users to manipulate one dimensional arrays by calling simple functions at the place of crafting their own functions and following some methods to do the modification in it.

i. array_remove function

This is a new function added in 9.3, which provides ability for removing the elements from array. Function takes two arguments:
a. One dimensional array from which user wants to remove elements
b. element value which user wants to remove .

Syntax of this function is given below:

ARRAY_REMOVE(<one dimensional array>, element)

Example of array_remove is given below:

postgres=# select array_remove(ARRAY['First','Second','Delete','Four'],'Delete');
    array_remove     
---------------------
 {First,Second,Four}
(1 row)

ii. array_replace function

This is a new in 9.3, this helps user to replace any element in array. This function can be use multi-dimensional array Or for single dimensional array.
Example of array_replace is given below:

pgsqltest=# select array_replace(ARRAY[ARRAY[1,2],ARRAY[3,4],ARRAY[56,6],ARRAY[7,8]],56,5);
       array_replace       
---------------------------
 {{1,2},{3,4},{5,6},{7,8}}
(1 row)

2. VARIADIC-labeled arguments expansion for concat and format functions

This is new addition to concat and format function. pre-9.3 concat and format, function used to ignore VARIADIC label and doesn’t used to print right output. However, this has been fixed in 9.3. Lets look at the example of pre-9.3 and in 9.3

In pre-9.3

worktest=# select concat(variadic array[1,2,3]);
 concat  
---------
 {1,2,3}
(1 row)

In 9.3

pgsqltest=# select concat(variadic array[1,2,3]);
 concat 
--------
 123
(1 row)

You can see above in 9.3, using VARIADIC label working properly.

Lets look at the format() function:

In pre-9.3

worktest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
ERROR:  too few arguments for format

woow, its error out. However, this is fixed in 9.3 :-).

In 9.3

pgsqltest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
 format 
--------
 1,2,3
(1 row)

3. Improvement in format() function to handle field width and left/right alignment

This new added in 9.3 format() function, which is going to increase the usability of format() function for developers. In 9.3, format function is coming with following enhancements:
i). Proper handling of field width.
ii). Proper handling of Left/right alignment.

Above two improvement was missing in pre-9.3. However, addition of above improvement made format() function to be follower of sprintf() C function.

Lets look at the example:
In pre-9.3, using of field width resulted into error message, as given below

ERROR:  unterminated conversion specifier
worktest=#  select format('>>%10s<<', 'Hello');
ERROR:  unterminated conversion specifier
worktest=#

However, 9.3 is coming with proper field width support.

pgsqltest=#  select format('>>%10s<<', 'Hello');
     format     
----------------
 >>     Hello<<
(1 row)

9.3 is also coming proper handling left/right alignment. Examples are given below:
Left alignment

pgsqltest=# select format('>>%-10s<<', 'Hello');
     format     
----------------
 >>Hello     <<
(1 row)

Right alignment

pgsqltest=# select format('>>%1$10s<<', 'Hello');
     format     
----------------
 >>     Hello<<
(1 row)

4. Proper handling of Negative century in to_char, to_date and to_timestamp functions

In pre-9.3, following function behavior for negative century was wrong or inconsistent
a. to_char
b. to_date
c. to_timestamp.

However, 9.3 is coming with proper fix/handling for negative century. Lets look at the output in pre-9.3 and in 9.3 for each functions.

pre-9.3

worktest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
         to_timestamp         
------------------------------
 01-AUG-13 00:00:00 -04:56:02
(1 row)

Above you can see that its displaying wrong result for BC. In 9.3

pgsqltest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
          to_timestamp           
---------------------------------
 4713-08-01 00:00:00-04:56:02 BC
(1 row)

Lets see for to_date functions. In 9.2/pre-9.3

worktest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
ERROR:  full year must be between -4712 and +9999, and not be 0
worktest=#

In 9.3

pgsqltest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
    to_date    
---------------
 4713-08-01 BC
(1 row)

Above, you can see in 9.3,to_date function is working right. However to_date in 9.2 didn’t able to handle it. Similar behavior you can see for to_char function.

5. Improvement in pg_get_viewdef() to print new line after each SELECT and FROM entry

This is kind of enhancement made in pg_get_viewdef function of 9.3. And user will be able to see this enhancement in pg_dump plain dump too.

This enhancement is more like readability of output of view definition in 9.3 and reducing the line length of view definition. Lets look at how this improvement makes user experience of getting/viewing view definition much better.

Before 9.3 or in pre-9.3

worktest=# select pg_get_viewdef('pg_tables'::regclass);
                                                                                                    
                                                                                  pg_get_viewdef    
                                                                                                    
                                                                              
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------
 SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, 
t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS
 hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_
tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
(1 row)

Above you can see that pg_get_viewdef has printed defintion of view in one line, which not only make readability of view definition difficult. However also increase the line length.

Lets look at the pg_get_viewdef in 9.3

pgsqltest=# select pg_get_viewdef('pg_tables'::regclass);
                        pg_get_viewdef                        
--------------------------------------------------------------
  SELECT n.nspname AS schemaname,                            +
     c.relname AS tablename,                                 +
     pg_get_userbyid(c.relowner) AS tableowner,              +
     t.spcname AS tablespace,                                +
     c.relhasindex AS hasindexes,                            +
     c.relhasrules AS hasrules,                              +
     c.relhastriggers AS hastriggers                         +
    FROM ((pg_class c                                        +
    LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))  +
    LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))+
   WHERE (c.relkind = 'r'::"char");
(1 row)

which seems me better in terms of readability and doesn’t have long line.

Enjoy!!

pg_xlog_location_diff function for PostgreSQL/PPAS

February 18, 2013February 27, 2013 Database Technologies & Security2 Comments

In PostgreSQL 9.2, community has added a function pg_xlog_location_diff(), which is very useful for finding the difference between two xlog location in bytes and also useful for monitoring replication. Detail of this function is given in following link:

http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP

However this function is not available for users of PostgreSQL/PPAS 9.0/9.1 users. So, I thought to write same function plpgsql so, that users can take benefit of same in 9.0/9.1.

Before using formula and developing function, lets understand what is xlog and offset.
Let’s consider user has used function pg_current_xlog_location() function and he gets following information:

worktest=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 1/D1012B80
(1 row)

In above, first field before forward slash is the hexadecimal value of logical xlog file and second field i.e. D1012B80 is hexadecimal offset inside the logical xlogfile.

So, whenerver user sees information for xlog location, he gets xlog information in following format:

(hexadecimal) xlog/ (hexadecimal) offset

To calculate the difference between two xlog location, user can use following formula, which gives difference in hexadecimal:

(FF000000 * xlog + offset) - (FF000000 * xlog + offset)

where FF000000 is max value of offset i.e In PostgreSQL/PPAS offset value can go from 00000000 to FF000000 and if we convert that in decimal, it gives maximum 4278190080 bytes (4080MB)

Based on above formula, following is plpgsql function which can be use to get the difference:

CREATE OR REPLACE FUNCTION pg_xlog_location_diff_sql( text, text)
 RETURNS numeric
 LANGUAGE plpgsql
AS 
 $function$
    DECLARE
       offset1 text;
       offset2 text;
       xlog1 text;
       xlog2 text;
       SQL text;
       diff text;
    BEGIN
       /* Extract the Offset and xlog from input in
          offset and xlog variables */
          
       offset1=split_part($1,'/',2);
         xlog1=split_part($1,'/',1);
       offset2=split_part($2,'/',2);
         xlog2=split_part($2,'/',1);
       
       /* Prepare SQL query for calculation based on following formula
         (FF000000 * xlog + offset) - (FF000000 * xlog + offset)
         which gives value in hexadecimal. Since, hexadecimal calculation is cumbersome
         so convert into decimal and then calculate the difference */
       
       SQL='SELECT (x'''||'FF000000'||'''::bigint * x'''||xlog1||'''::bigint 
                                +  x'''||offset1||'''::bigint)'||' 
                - 
                   (x'''||'FF000000'||'''::bigint * x'''||xlog2||'''::bigint 
                                +  x'''||offset2||'''::bigint)';
       EXECUTE SQL into diff;
       
       /* Return the value in numeric by explicit casting  */
       
       RETURN diff::numeric;
    END;
 $function$;

Usage example is given below:

worktest=# select pg_xlog_location_diff_sql(pg_current_xlog_location(),'1/D009F578');
 pg_xlog_location_diff_sql 
---------------------------
                  16230472
(1 row)

I hope this will help PostgreSQL/PPAS 9.0/9.1 users.

Automating Binding of Servers with PEM Agent in Postgres Enterprise Manager 2.1.

June 4, 2012June 4, 2012 Database Technologies & Security5 Comments

This is second post of Postgres Enterprise Manager 2.1 series.

Question which people ask, Is Postgres Enterprise Manager tool useful for Companies, which provide services for PostgreSQL? Answer is yes. What about companies, which provide infrastructure support, where they provide server and PostgreSQL database, for them is this is a good tool for monitoring PostgreSQL? Answer is yes, you can use it.

Companies, which are giving Infrastructure support and providing Database As service always want everything to be automated. i.e with provisioning server, server should have installed PostgreSQL and its components plus they want automatic installation of pemagent, which is also acheivable. However, they stumped on Automatic Bidning of PEM Agent with PostgreSQL.

For binding PostgreSQL with PEM Agent, Simple Method is using PEM Client.
In PEM Client, Add a PostgreSQL in PEM Directory and then user can bind the PEM agent with PostgreSQL Cluster.

File -> Add Server -> which will popup screen for "New serve Registeration" -> After adding server information (Also click on Store PEM Server)

Then in “New server Registeration Window” ->

 Click on tab "PEM Agent" -> Choose agent from drop down window of "Bound Agent", Then user can include the information.

Well above is one method. However for Company which provides infrastructure support doesn’t want to do this manual work. They want this activity to be automated.

For automating this process, user can do following:
1. Install the PEM agent in unattended mode. For detail of installing pemagent in unattended mode, user can use command like:

./pem_agent-2.1.0.2-linux-x64.run --prefix /opt/PEM --mode unattended --pghost <pem server hostname> --pguser <pem server username and password> --agent_description <Agent Hostname>

2. Create .pgpass file in root/admin user home directory. Like on linux root home directory is /root.

# cat .pgpass
#hostname:port:database:username:password
*:5432:*:postgres:<your password>

Change the permission on .pgpass:

chmod 600 .pgpass

3. After creating the .pgpass file in home directory, execute following SQL:

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "
INSERT INTO pem.server(description,server,port,database) VALUES('New server','IP Address',5432,'postgres',2);" -U postgres

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "INSERT INTO pem.server_option(server_id,pem_user,username,server_group) VALUES((SELECT id FROM pem.server WHERE server='IP address' AND active=true),'postgres','postgres','PEM Server Directory');" -U postgres

/opt/PEM/agent/bin/edb-psql -h <PEM Server hostname> -p 5432 -c "INSERT INTO pem.agent_server_binding(agent_id,server_id,server,port,username,database)
VALUES((SELECT id FROM pem.agent WHERE description ='Agent Hostname' AND active=true),(SELECT id FROM pem.server WHERE server='IP address' AND active=true),5432,'postgres','postgres');" -U postgres

After executing above SQLs, you have successfully binded agent with Server for monitoring.

User can also include above steps in shell script and can execute while provisioning new server for their client.

Have Fun!

Partition Syntax Support in PPAS 9.1

February 14, 2012 Database Technologies & Security8 Comments

In Series of New Features in Advanced Server 9.1, today I tought to write about Parition Table Syntax Supported in PPAS 9.1. In PostgreSQL and till PPAS 9.0, user has to follow method given below for partitioning a table:
1. CREATE PARENT Table,
2. Create Child tables using Inherit feature
3. Create Trigger on Parent table, so that trigger can re-direct insert to Right Partition.
4. And if user has to add new child table, then it has to do 2 and 3 steps again.

Now, in PPAS 9.0, user doesn’t have to perform above activities. PPAS 9.1 supports PARTITION TABLE syntax.

Lets see how PARTITION syntax in PPAS can make users/DBAs life easier.

We know that PPAS/PostgreSQL supports two types of partition (Range and List). So, we will see how its simple with CREATE PARTITION SYNTAX.
Lets CREATE RANGE PARTITION as we used to do in PPAS 9.0/PostgreSQL
1. Create Master table as given below:

CREATE TABLE partition_master(id numeric primary key,val text);
CREATE TABLE partition_child1(CHECK (id > 0 and id <=10)) inherits(partition_master);
CREATE TABLE partition_child2(CHECK (id >10 and id <=20)) inherits(partition_master);

2. Then create check constraints on Both Child tables

alter table partition_child1 add primary key (id);
alter table partition_child2 add primary key (id);

3. Create Trigger, which redirect Inserts to correct child table:

CREATE OR REPLACE FUNCTION part_trig_insrt() RETURNS trigger
AS
$$
BEGIN
   IF TG_OP='INSERT' THEN
      IF NEW.id >0 and NEW.id <=10 THEN
           INSERT INTO partition_child1 VALUES(NEW.*);
      ELSIF NEW.id >10 and NEW.id <=20 THEN
           INSERT INTO partition_child2 VALUES(NEW.*);
      ELSE
          RAISE 'inserted partition key doesnt map to any partition';      
     END IF;
  END IF;
END;
$$ language plpgsql;
CREATE TRIGGER partition_trig_insert BEFORE INSERT ON partition_master FOR EACH ROW execute procedure part_trig_insrt();

similarly, you have to write trigger which can handle partition key update.

And whenever user wants to add new partition, he has to update the trigger function and create new partition table etc.

Now, in 9.1AS on-wards, user can run single command for partition and PPAS will take care of all things, as given below:

CREATE TABLE partition_master (id numeric primary key,val text)
PARTITION BY RANGE(id)
(PARTITION partition_child1 VALUES LESS THAN (11),
PARTITION partition_child2 VALUES LESS THAN (21));

edb=# insert into partition_master values(1,'First');
INSERT 0 0
edb=# insert into partition_master values(11,'Eleventh');
INSERT 0 0
edb=# select * from partition_master;
 id |   val    
----+----------
  1 | First
 11 | Eleventh
(2 rows)

edb=# select * from only partition_master_partition_child1;
 id |  val  
----+-------
  1 | First
(1 row)

edb=# select * from partition_master_partition_child2;
 id |   val    
----+----------
 11 | Eleventh
(1 row)

That was easy. With single command you can have your partition table in PPAS.

Suppose later, user wants to add one more partition then he can execute following single command:

ALTER TABLE partition_master add partition partition_child3 VALUES LESS THAN (31);

edb=# insert into partition_master values(30,'Thirty');
INSERT 0 0
edb=# select * from partition_master;
 id |   val    
----+----------
  1 | First
 11 | Eleventh
 30 | Thirty
(3 rows)

edb=# select * from partition_master_partition_child3;
 id |  val   
----+--------
 30 | Thirty
(1 row)

Thats simple. isn’t?

With this PPAS 9.1 Partition Syntax also allows swaping an existing table with a partition or subpartition, as given below:

CREATE TABLE single_table (id numeric primary key,val text)
insert into single_master select generate_series(1,10);
INSERT 0 10

ALTER TABLE partition_master 
EXCHANGE PARTITION partition_child1
WITH TABLE single_table;

Other syntax which are supported is given below:

1.  ALTER TABLE… ADD PARTITION
2.  ALTER TABLE… ADD SUBPARTITION
3.  ALTER TABLE… DROP PARTITION
4.  ALTER TABLE… DROP SUBPARTITION
5.  ALTER TABLE… SPLIT PARTITION
6.  ALTER TABLE… SPLIT SUBPARTITION
7.  ALTER TABLE… TRUNCATE PARTITION
8.  ALTER TABLE… TRUNCATE SUBPARTITION
9.  ALTER TABLE… EXCHANGE PARTITION
10. ALTER TABLE… MOVE PARTITION
11. ALTER TABLE… RENAME PARTITION

Support of above partition syntaxes have really made management of partition table easier in PPAS 9.1!

New Features in Slony 2.1

November 3, 2011August 27, 2013 Database Technologies & Security4 Comments

Slony 2.1 has been released on 19th Oct. 2011. So, I thought to look at some important improvement done in this new release, which can make users life easier.

Before discussing about the changes, lets setup slony replication.

Following Codes can be use for setting up slony replication.

####   Preable Scripts:

cluster name=slonytest;
NODE 1 ADMIN CONNINFO = 'dbname=postgres host=localhost user=postgres port=5432 password=postgres';
NODE 2 ADMIN CONNINFO = 'dbname=repdb host=localhost user=postgres port=5432 password=postgres';

Adding Node script:

### create_nodes.slonik

include <preamble.slonik>;
init cluster (id=1, comment='slonytest node 1');
store node (id=2, comment='slonytest subscriber node 2', event node=1);

Storing Path:

### store_paths.slonik

include <preamble.slonik>;
STORE PATH (SERVER=1, CLIENT=2, CONNINFO='dbname=postgres host=localhost user=postgres port=5432 password=password');
STORE PATH (SERVER=2, CLIENT=1, CONNINFO='dbname=repdb host=localhost user=postgres port=5432 password=passsword');

Master Slon Process conf file:

## Master Slon Conf:

vac_frequency=3
cleanup_interval="10 minutes"
log_level=4
sync_interval=2000
sync_interval_timeout=10000
sync_group_maxsize=6
sync_max_rowsize=8192
sync_max_largemem=5242880
syslog=0
log_pid=false
log_timestamp=true
pid_file='/Users/vibhor/PGtools/slony/master_slon.pid'
syslog_ident=slon
cluster_name='slonytest'
conn_info='dbname=postgres host=localhost user=postgres port=5432 password=postgres'
desired_sync_time=60000
sql_on_connection="SET log_min_duration_statement TO '1000';"

Slave Slon process conf file:

## Slave Slon Conf:
vac_frequency=3
cleanup_interval="10 minutes"
log_level=4
sync_interval=2000
sync_interval_timeout=10000
sync_group_maxsize=6
sync_max_rowsize=8192
sync_max_largemem=5242880
syslog=0
log_pid=false
log_timestamp=true
pid_file='/Users/vibhor/PGtools/slony/slave_slon.pid'
syslog_ident=slon
cluster_name='slonytest'
conn_info='dbname=repdb host=localhost user=postgres port=5432 password=Empid#042'
desired_sync_time=60000
sql_on_connection="SET log_min_duration_statement TO '1000';"

Ater creating above configuration, user can setup Slony Replication.

Now, lets look at the features added in slony 2.1

1. Support for adding tables in bulk

In Previous Version of Slony, if user has to add tables of a particular schema, then he has to write slonik command as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table (id=1, set id=1, origin=1, fully qualified name='public.dept', comment='slonytest table public.dept');
set add table (id=2, set id=1, origin=1, fully qualified name='public.emp', comment='slonytest table public.emp');
set add table (id=3, set id=1, origin=1, fully qualified name='public.employees', comment='slonytest table public.employees');

Which was a bit work, where user has to list all the tables and then add into slonik command Or user has to write a script which can generate slonik command.

Now, in slony 2.1, user can write single command to add all tables of a schema in slony replication as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');

Snapshot of activity is given below:

edbs-MacBook-Pro:slony vibhor$ cat create_set.slonik
include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');
edbs-MacBook-Pro:slony vibhor$
edbs-MacBook-Pro:slony vibhor$ slonik < create_set.slonik

User can verify the added tables in bulk using following command:

postgres=# select * from _slonytest.sl_table;
 tab_id | tab_reloid | tab_relname | tab_nspname | tab_set |   tab_idxname   | tab_altered |   tab_comment
--------+------------+-------------+-------------+---------+-----------------+-------------+------------------
      1 |      16438 | dept        | public      |       1 | dept_pk         | f           | replicated table
      2 |      16445 | emp         | public      |       1 | emp_pk          | f           | replicated table
      3 |      16995 | employees   | public      |       1 | employees_pkey  | f           | replicated table
      4 |      16987 | entities    | public      |       1 | entities_pkey   | f           | replicated table
      5 |      16456 | jobhist     | public      |       1 | jobhist_pk      | f           | replicated table
      6 |      16967 | person_job  | public      |       1 | person_job_pkey | f           | replicated table
(6 rows)

If user wants to add all sequences of a particular schema in Slony Replication, then he can try something like given below:

include <preamble.slonik>;
set add sequences(set id=1, tables='public.*seq'); ## Adding all sequences of Public schema

User can also use some patterns while adding schema as given below:

include <preamble.slonik>;
set add tables(set id=1, tables='public.slonytest_[1234]',add sequences=true); ## adding tables: slonytest_1, slonytest_2, slonytest_3

2. Implicit WAIT FOR

In Previous version of Slony, user has to use WAIT FOR Clause to before executing command like MergeSet/MoveSet

     SUBSCRIBE SET (ID = 999, PROVIDER = 1, RECEIVER = 2);
     WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = ALL, WAIT ON=1);
     SUBSCRIBE SET (ID = 999, PROVIDER = 1, RECEIVER = 3);
     WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = ALL, WAIT ON=1);
     MERGE SET ( ID = 1, ADD ID = 999, ORIGIN = 1 );

Now, in 2.1, if user forgets WAIT FOR clause, then slony will use implicit WAIT FOR whenever it requires. However, user has to keep following in mind:
a. User should not run multiple slonik script parallely. Since, running parallel slonik script may confuse Slonu implicit WAIT FOR behavior.
b. Slonik Command will use WAIT FOR if event node changes.
c. Slonik Command will wait before executing SUBSCRIVE SET/DROP NODE/CLONE NODE Commands, till primary caught up with other slave nodes.
d. Slonik Command CREATE SET will wait, untill all outstanding DROP SET confirmed by master and slaves in a Slony Replication cluster.
e. WAIT FOR command will not work in TRY block. — (Incompatibilities of other version)

3. Support for replicating TRUNCATE
Slony 2.1 has introduced feature of Replication TRUNCATE Command on slaves. TRUNCATE replication is possible for PostgreSQL version >=8.4.
When user subscibe a table using slony 2.1, then slony creates a _slonytest_truncatetrigger trigger on master to replicate those truncate command.
Lets see its work:

edbs-MacBook-Pro:~ vibhor$ psql -c "truncate table person_job" postgres
TRUNCATE TABLE
edbs-MacBook-Pro:~ vibhor$ psql -c "select * from person_job" repdb
 pid | job
-----+-----
(0 rows)

4. Health checks at startup
slony 2.1, now checks the status of Each node during startup and give proper message to fix the issue. In health check it checks the configuration information and returns OK, if there is no problem. Pl/pgSQL function which has been added for health check is given below:

CREATE OR REPLACE FUNCTION _slonytest.slon_node_health_check()
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
declare
		prec record;
		all_ok boolean;
begin
		all_ok := 't'::boolean;
		-- validate that all tables in sl_table have:
		--      sl_table agreeing with pg_class
		for prec in select tab_id, tab_relname, tab_nspname from
		"_slonytest".sl_table t where not exists (select 1 from pg_catalog.pg_class c, pg_catalog.pg_namespace n
				where c.oid = t.tab_reloid and c.relname = t.tab_relname and c.relnamespace = n.oid and n.nspname = t.tab_nspname) loop
				all_ok := 'f'::boolean;
				raise warning 'table [id,nsp,name]=[%,%,%] - sl_table does not match pg_class/pg_namespace', prec.tab_id, prec.tab_relname, prec.tab_nspname;
		end loop;
		if not all_ok then
		   raise warning 'Mismatch found between sl_table and pg_class.  Slonik command REPAIR CONFIG may be useful to rectify this.';
		end if;
		return all_ok;
end
$function$

5. Performance improvement in cases of large backlog
Slony 2.1 came with performance improvement in case of large replication backlog. In previous versions, user used to experience performance impact when sl_log_* tables grows and replication of data to subscriber takes so long. Due to which Slony Replication takes huge time to catch up. Main cause of this performance issue was: Slony used to do sequential scan while pulling data from sl_log_* tables. Now, in 2.1, queries which pulls data from sl_log_* have been modified and now it uses index scan.

6. Monitoring thread to provide better monitoring data
2.1, also came with one monitoring table sl_components. This table is very useful in monitoring the Slony Threads. Description/detail of sl_components can be found in following link:

http://slony.info/documentation/2.1/table.sl-components.html

Snapshot of informantion display by sl_components is given below:

postgres=# select * from sl_components;
       co_actor       | co_pid | co_node | co_connection_pid |   co_activity    |       co_starttime        |  co_event  | co_eventtype
----------------------+--------+---------+-------------------+------------------+---------------------------+------------+--------------
 local_listen         |  26688 |       1 |             26692 | thread main loop | 2011-11-03 12:07:56+05:30 |            | n/a
 local_sync           |  26688 |       0 |             26697 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remote listener      |  26688 |       2 |             26695 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remoteWorkerThread_2 |  26688 |       2 |             26694 | SYNC             | 2011-11-03 12:08:00+05:30 | 5000004917 | SYNC
 local_monitor        |  26688 |       0 |             26698 | thread main loop | 2011-11-03 00:32:19+05:30 |            | n/a
 local_cleanup        |  26688 |       0 |             26696 | cleanupEvent     | 2011-11-03 12:00:15+05:30 |            | n/a
(6 rows)

PL/Perl Getting Data From External File in PostgreSQL/PPAS

April 28, 2011July 17, 2011 Database Technologies & Security2 Comments

Till PostgreSQL 9.0, PostgreSQL doesn’t have any feature which can be use to read the external file (a concept of External Table).

External Table is a feature of Database using which, any one can read the flat files as if it were in a table in the database.

So, I thought give an idea how user can implement this in Database using plperl.

Let’s see how you can implement it.

I have a flat file, which has following data:

Filename: test.data
Location: /tmp/test.data

1,Check1
2,Check2
3,check3
4,check4
5a,check5
5a,check5

Since, I know this flat file is having two field (numeric and text), so,I can write a plperl function, which can read the file and can get the attribute(column values) of a line on the basis of passed delimiter, with this if there is any bad record, that will go to the bad file for verification if required.

Following is a sample plperl function:

CREATE OR REPLACE FUNCTION read_external(text,text) returns table(id numeric, col1  text)
as
$$
## Written by Vibhor to Read external file
my $badfile = ">>"."$_[0]".".bad";
open (BADFILE,$badfile);
open (DATFILE, $_[0]);
while ($line=<datfile>) {
  chomp($line);
  @line=split($_[1],$line);

if ($line[0] =~ /^[0-9]+$/ && $line[0] !="")
{
    return_next({id => $line[0], col1 => $line[1]}); }
else {
  print BADFILE "$line\n";
  }
}
close(DATFILE);
close(BADFILE);
return undef;
$$ language plperlu;

And I have made a view which can be use to SELECT the records in flat file as a table. Following is a result

postgres=# create or replace view external_table as select * from read_external('/tmp/test.data',',');
CREATE VIEW
postgres=# select * from external_table;
 id |  col1
----+--------
  1 | Check1
  2 | Check2
  3 | check3
  4 | check4
(4 rows)
And I know, there is a bad record: 5a,check5, which I can find in bad file:

postgres=# \! cat /tmp/test.data.bad
5a,check5

Database Technologies & Cyber Security Platform

We explore and grow together

Tag: PPAS

Exploring MULTISET INTERSECT and MULTISET EXCEPT in EDB Postgres Advanced Server 16: Enhancing Data Handling Capabilities

What Are MULTISET INTERSECT and MULTISET EXCEPT?

The Importance of These Features

Practical Examples with Working Code

Example 1: Using MULTISET INTERSECT

Example 2: Using MULTISET EXCEPT

Conclusion

Postgres and Transparent Data Encryption (TDE)

Meet BART – A New Tool for Backup And Recovery Management

Monitoring approach for Streaming Replication with Hot Standby in PostgreSQL 9.3.

New in PostgreSQL 9.3: New in Functions

pg_xlog_location_diff function for PostgreSQL/PPAS

Automating Binding of Servers with PEM Agent in Postgres Enterprise Manager 2.1.

Partition Syntax Support in PPAS 9.1

New Features in Slony 2.1

PL/Perl Getting Data From External File in PostgreSQL/PPAS