Process output of SQL Query from Command Line


Many times I want to send a query to MariaDB (MySQL) from the command line and simply have the output appear on the screen. I don't want to write the whole "connect to database" part of the program. I simply want to issue a command similar to:

mysql databasename < queryFile.sql

And have the results of my query come up. Since I'm pretty good with SQL, it is a quick and dirty way to get some information, and in the case of something I need fairly regularly, but not that often, it is very doable.

For queries I want occasionally, I place the information in an sql file, so I'm not constantly re-inventing the wheel.

However, there are times when SQL is simply not enough. I need to do something that requires more processing than SQL can easily accomplish. That is what processQueryOutput.pl was written for. It is a filter, designed to take the output of an SQL query (or any process that emits a tab delimited file with headers), then does some custom processing and emits that processing.

Basic steps are as follows:

  1. Create and test an SQL query and save it to a file. Test it with the command
     mysql databasename < queryFileName
  2. Create a template file in Perl with one or more of the following items in it
    1. sub doCalc
      1. This has one parameter, a reference to a hash containing the values for one line of output from the query. The keys are the column names (case sensitive and exact, with embedded whitespace and/or special characters).
    2. sub printResults
    3. 0 or more global variables to be used by the above

Now, you can run and process the query with the following command:

mysql dbName < myQuery.sql | ./processQueryOutput.pl templateName

where dbName is, obviously, the name of the Database you are attaching to, myQuery.sql is the SQL file you created and templateName is the name of the template file you created above.

Note: Version 1 of this script loads the entire output of the query into memory, then processes it one line at a time. Thus, if you have a large output, you will use entirely too much memory. I'll probably redo it soon so it processes one line at a time.

Note: I wrote this for me. It assumes the input is a tab delimited series of lines, with the first line containing headers.

How it works:

  1. All lines are read into the array $lines, which is then chomp'ed
  2. sub getHeader is called with the first line. It tears the line apart with split (on tabs), returning an array containing the column names (as defined by the query) in order.
  3. For each additional line of input
    1. Line is loaded into a hash with header1 => value1, header2 =>value2, etc... (and done in a very cute way, BTW, see sub parseALine, thanks perlmonks)
    2. reference to hash is then passed to sub doCalc to process that one line
  4. sub printResults is called

Example

This is a real world example from the database backing the open source accounting package FrontAccounting. This package has a lot of good things, but never has all the reports I want (tne the ad-hoc reporting stinks). However, it is great for my needs otherwise and the developer has a complete ERP diagram just begging to be used.

One of the reports I needed is a projection of recurring invoices for a 12 month period. The query to retrieve that is:

select
      id "id",
      description "description",
      salesorder.total "total",
      debtor.name "client",
      days "days",
      monthly "monthly",
      last_sent "lastSent",
      begin "begin",
      date_add(IF(last_sent='0000-00-00',begin,last_sent), interval invoices.monthly month ) "nextDue",
      month(date_add(IF(last_sent='0000-00-00',begin,last_sent), interval invoices.monthly month )) "nextMonthDue"
from 
      0_recurrent_invoices invoices join 0_sales_orders salesorder using (order_no)
      join 0_debtors_master debtor on (invoices.debtor_no=debtor.debtor_no)

Note: this is my query, and it is definitely NOT correct, but good enough for my purposes. Notice that I alias each column output so I get an easy to use "name" for the processing. Thus, the column last_sent will be known in doCalc as $$hash{'lastSent'} (because that is what I aliased it to).

This query actually returns more columns that I need: All I really need are the columns with aliases client, monthly, and nextMonthDue. However, I may want to use this query for something else in the future, so I write it this way.

This will return output similar to the following (I am ignoring the columns I'm not using)

total	client	monthly	nextMonthDue
248.98	client1	12	9
48.71	client2	3	12
48.71	client3	1	9

 This means client1 pays $248.98 every 12 months, client2 pays $49.71 every three months, and client3 pays $48.71 every month. Client 1 and 3 are next due in September, and client 2 is next due in December.

What I wanted from this was a projection of the total income expected every month, and I could not figure out one simple SQL statement that would do it (if you know of one, pretty please let me know, with explanation). So, with the above example, Sep would have $248.98+$48.71 (client 3 and 1). October would have $48.71 from Client3, November would have the same, but December would have $248.98+48.71+48.71 as all clients are due then.

Example Template

Maybe the word "template" is bad her. Maybe "module" or something. But, what I wanted was an accumulator that would store the sum for each month, and some kind of process that would add calculate and add the appropriate amount. I created a global variable @totals, and wrote code to add values to it in sub doCalc. Then, I told sub printResults to simply dump it to STDOUT

My code follows (and, I'm sure someone could come up with something better).

#! /usr/bin/perl -w
my @totals;

sub doCalc {
   my ( $input ) = @_;
   my $payment = $$input{'nextMonthDue'};
print STDERR "Processing $$input{'client'}\n"; do { $totals[$payment] += $$input{'total'}; $payment += $$input{'monthly'}; $payment %= 12; $payment = 12 unless $payment; } while $payment != $$input{'nextMonthDue'}; } sub printResults { for ( my $month = 1; $month <= 12; $month++ ) { print "$month\t$totals[$month]\n"; } }

 Note the #! /usr/bin/perl is only there so I could test it, but it doesn't hurt anything so I left it in.

sub doCalc takes the hash representing the one line, prints a message to STDERR so we know what is going on, then calculates which months (elements in array @totals) to put the values into (called 'total" here).

sub printResults simply prints the array, showing the month number, then a tab, then the total for that month. Of course, you can format things a lot prettier, and you can create headers.

The source code, the query, some sample data and the template are attached here.

Note: the script doesn't care where the data comes from. You can just as easly redirect STDIN to a file for testing. For these files, use the following command:

./processQueryOutput.pl recurrentInvoiceProjection.template < calc_recurrent_test.csv

and the output would be

 
1	6600.29
2	5820.81
3	6814.6
4	4947.83
5	6768.07
6	6990.85
7	5633.62
8	5820.81
9	7983.5
10	4947.83
11	6712.13
12	6060.39

 

Attached files: processQueryOutput.pl, recurrentInvoiceProjection.template, recurrent_invoices.sql, calc_recurrent_test.csv

Last update:
2014-09-06 04:58
Author:
Rod
Revision:
1.0
Average rating:0 (0 Votes)

You can comment this FAQ

Chuck Norris has counted to infinity. Twice.