Friday, May 4, 2018

Loading a 1% Random Sample


I needed to load a one percent random sample of a large file last week. I’m sure others have figured out this method but this was the first time I coded it like this:
MYSAMPLE:
Load * from MYFILE.qvd (qvd)
Where Rand()<=0.01;

The Rand function returns a random value between 0 and 1. In the load statement above it would only be true 1% of the time (approximately) -- that would load only 1% of the rows. If it was coded as Where Rand()<=0.20 then it would be a 20% random sample, etc.

This method can be used with almost any Load statement even one of the steps in a preceding load. 
Note that we're talking about random numbers here so the total number of rows loaded using this technique will not be exactly 1% of the total rows in the file, although it will be close especially when the total number of rows in the file is very large. Be aware too that the rows that get loaded will not be an evenly distributed sample of rows although it will be close.

  ★★★

Wednesday, March 14, 2018

Rename a Cycle Group



I often use Rob Wunderlich’s excellent Copy Groups utility to copy cycle groups from one QlikView document to another; or to copy a cycle group from a document back into the same document to make a duplicate (to see the utility and get your own copy, search on ”QlikView Cookbook Copy Groups Utility”). But, sometimes after copying the cycle group, the name of the group may not be ideal.

Author's note:  I removed the rest of this post after seeing Andre's comment. Thanks, Andre - that was something I didn't know... and it's much simpler than what I wrote.

Portion of Andre's comment:

A group can be renamed by going to Document Properties | Groups | Select Group | Edit | Enter New Name | OK

This changes where the group is used in a chart dimension as well
 

Friday, January 19, 2018

Utility That Can Extract and Show All of the Variables from Your QlikView Document

Some of our project teams at work are required to use a standard QlikView document for several clients and then use a client-specific loadscript .qvs file to configure custom variables for each client. The loadscript files define the variables with a set of Let statements and I often get questions about the syntax for the Let statements. Some of the variables define complex expressions and they include dollar signs and functions and quotes and some definitions use multiple lines so the Let statement syntax can get tricky.

I recommend that people do their development first so that they have a working document with all of the desired features and variables that are required. Then, they can use a QlikView document I wrote to be used as a utility program to help people with managing their variables.

You can download a working copy of the utility program, named List_Out_Variables.qvw, by clicking HERE

The utility program has a tab with some text instructions and an input box where people type in the complete pathname to the custom .qvw document that they’ve been working on. Then, they do a reload of the utility program and it extracts all of the variables from the custom document.

The input box stores the pathname in a variable named vDocumentPath. Part of the loadscript in the utility program looks like this:
VARIABLES:
LOAD
  [Name] as VARIABLE_NAME,
  [RawValue] as VARIABLE_VALUE
FROM  $(vDocumentPath) (XmlSimple, Table is [DocumentSummary/VariableDescription]);

That loadscript code extracts the variables from the document and makes the variable names and values available to be shown in charts. (I borrowed the loadscript code from a Rob Wunderlich comment I found in QlikCommunity.)

The second tab on the utility is pretty simple, it is table box showing the VARIABLE_NAME and VARIABLE_VALUE fields along with list boxes for those two fields so that people can select the variables they are interested in. The utility is pretty useful right there as a way to analyze variables, sort them, print them, copy and paste to other documents, etc.

There is a third tab with a chart that helps people with the Let statement syntax that I mentioned. The chart on that tab uses VARIABLE_NAME as the dimension but I found it helps to have the dimension column hidden. Then, there is one expression that converts the VARIABLE_VALUE into a Let statement. The expression looks like this:
'Let '&VARIABLE_NAME&'='&chr(39) &
Replace(Replace(Replace(VARIABLE_VALUE,vQUOTE,vQR),vDOLLAR,vDR),vCR,vCRR)
& chr(39)&';'

An example of how that looks in the actual chart is:
Let MoneyFormat=''&chr(36)&'#,##0.00;('&chr(36)&'#,##0.00)';

To make that work I had to define a few variables in the utility program. It made that text expression much simpler to write. The variables in the utility program are:
vDollar is a variable containing a single dollar sign
vQuote is a variable containing a single quote
vCR is a variable containing a carriage return (really simple, when defining this one just put your cursor in the variable definition window and click your Enter key).
vQR is  '&chr(39)&'
vDR is  '&chr(36)&'
vCRR is  '&chr(13)&'      

If you download a copy of the utility program then you don't have to worry about any of that coding as it is already done for you.

After loading a set of variables extracted from your special document, my instructions tell people to make their selections and then, from the chart, they can right-click and export the chart or copy the data to the Windows clipboard so it can be pasted into a .qvs file or pasted into a loadscript or pasted in some other document.
.
Having an easy method to define a set of variables as Let statements can be used to copy variables from document to document or to accompany a version control process where variables and their values need to be stored as text in a way that allows them to be analyzed or restored later.

Some folks familiar with this blog may remember that several years ago I wrote about an Excel spreadsheet containing a macro that could extract QlikView variable names and values. It worked great but is no longer useful in our corporate environment as the corporate anti-malware software considers it a “suspicious program” and refuses to let it run. The List_Out_Variables.qvw utility is a good replacement.


  ★★★

Saturday, May 27, 2017

Data Profile Tool

I’ve written in this blog before about how I’m often called on to validate a new data file from a client or to examine a new data extract. My first step is to load data into a copy of Steve Dark’s Data Profiler (Steve's original post is useful - search on quickintelligence data profiler). The data profiler gives me information about exactly what values exist in each of the fields.

I’ve made a few modifications to Steve’s original document:
  • Added a table box that adjusts itself to the fieldnames in the file
  • Added a small macro that updates the Statistics Box with the fieldname automatically as different fieldnames are selected. A trigger detects the change and automatically configures the Statistics Box
  • Added a tab to help search for duplicate values in either a field or combination of fields
  • A check in the loadscript to see if the file exists and if it does not exist then it opens a msgbox from the loadscript using a function


You can download an example copy of the file data profiler that I use by clicking HERE
.

I also keep a version of the data profiler that is tailored for loading data from an Oracle table and a version of the profiler tailored for loading data from a qvd file. I think I use the data profiler tools almost every day to examine data in files and tables to help with QlikView document design and other reporting requirements.

These two examples have no data loaded - you will have to do a reload yourself from your own data:
  • You can download an example copy of the profiler I use for qvd files by clicking HERE
  • You can download an example copy of the profiler I use for Oracle tables by clicking HERE


  ★★★

Sunday, May 14, 2017

Have you got a function similar to SQL NVL ?


I work with a group of people who are all experts in SQL. They all also have varying levels of technical skills and QlikView skills. One question I get frequently is whether QlikView has a function like the SQL NVL function.

For those of you not familiar with relational database SQL language, the NVL function takes two arguments: a field name or expression and a default value to be returned if the first argument is null. For example, a SQL database query may include the function like this:
  NVL(ORD_DISCOUNT, 0) AS ORD_DISCOUNT
And that would tell the SQL processor to look at the value of ORD_DISCOUNT and if it is null then return 0 as the function value otherwise return ORD_DISCOUNT.

When people ask about achieving the same thing with QlikView, I usually start by telling them that they can code an “if” statement like this:
  If(IsNull(ORD_DISCOUNT), 0, ORD_DISCOUNT)
and then I explain that there is a built-in QlikView function that can be used similar to NVL as long as the field you are checking is supposed to be numeric. It is the Alt function.
The Alt function accepts any number of arguments and looks at each of them going from left to right and returns whichever one is a valid number. So, repeating our example, if ORD_DISCOUNT is null then the following function will return a zero but if ORD_DISCOUNT is a valid number then the function returns ORD_DISCOUNT:
  Alt(ORD_DISCOUNT, 0)   

The Alt function will treat the last or rightmost argument as an “else” condition and return that value if none of the preceding arguments are numeric. The rightmost value need not be numeric so you could code something like this:
  Alt(ORD_DISCOUNT, 'Discount is missing')

The Alt function may be used in the loadscript code or in chart expressions.
If you are interested, the QlikView Help (search in Help for Conditional Functions) shows an interesting example for how the Alt function can be used to identify a date when the date value may be any one of several different date formats.  


  ★★★

Wednesday, April 5, 2017

Converting Edited Number Text into a Numeric Field

My co-worker, Naveen, had a requirement for a document that would load some financial data from a file that a client had sent to us. The client had created the file with a “screen scraping” type of application that captured the number fields as edited numeric text. For example, a field might contain  ($3,046.10)   So, that example includes a dollar sign, comma as a thousands separator, period as a decimal point, and parenthesis to indicate a negative number or credit.

Naveen needed to load the data as a number. Here’s how it was done:
We used two functions. A Num# function converted the text string into a dual field containing both a text portion and a numeric portion. Then, an outer Num function extracted just the numeric portion. For example, one of the number fields was named AUG. In the loadscript, the line that converted the edited number into a simple numeric field looked like this:
Num(Num#(AUG, '$#,##0;($#,##0)'))

The edit string or format code, the part within the single quotes, can vary depending on your requirement. The edit string I used in the example is a good one for the kind of editing you might find in a financial spreadsheet.

Here's a few other examples of edited number text and how it looks after using this expression:
AUG
Num(Num#(AUGUST,'$#,##0;($#,##0)'))
$0
0
$1
1
$1,251
1251
$1,465
1465
$1,466
1466
$3,724
3724
$4,158
4158
($11)
-11
($132)
-132
($2)
-2
($4)
-4
What now?
         (this is a null)
19
19
25.12
25.12
25.1234
25.1234
25.3
25.3
2,070
2070


  ★★★

Wednesday, December 28, 2016

Straight Table with Different Expressions on Each Row

We had a requirement a few months back for a document that would show a table with different expressions on each row along with some subtotals and expressions that reference other rows and columns. It needed to be a single table that could be sent to Excel. It was obviously a requirement that grew out of existing, legacy reporting solutions using Excel but many corporations are wedded to Excel and the larger the corporation, the harder it is to find anyone who believes they have the authority to make changes or do something different.

The solution was to build a document with a straight table with a single dimension that is loaded in the loadscript with an inline load like this:

TBL_KPI_TYPE:
LOAD * INLINE [
KPI_NO, KPI_TYPE
1, Transportation:
2, Intermodal % of Miles
3, Sea %
4, LTL %
5, Avg Pallets/Truck
6,
7, Total Diesel Cost
8, Diesel $/Gallon
9, Diesel # of Gallons
10, Average distance to Customers
];

The straight table used KPI_TYPE as the dimension so those text values get listed in the leftmost column like any other dimension. The chart expression had to be slightly different for each column -- some columns were fiscal months, some fiscal quarters, one was a year-to-date column -- but they all looked something like this:
Pick(KPI_NO,
'  '.
$(v.intermodel_pcent_miles),
$(v.sea_pcent),
$(ltl_pcent),
$(avg_palletspertruck),
'    ',
$(total_diesel),
$(diesel_price),
$(diesel_tot_gallons),
$(avg_dist_to_cust)
)

This is simplified from the actual document.
  • Note that the Pick function in the expression uses KPI_NO which is defined in the inline load. The Pick function avoids “if” statements that can affect speed performance for the table as users change their selections.
  • The dimension isn’t a real data field. Each line in the expression must use the appropriate field names and set analysis to satisfy the requirement for the row. In the example above, variables are used for each row to help make the overall expression understandable at a glance.
  • Because each row is a different expression, the formatting must be done within the expression in the variable using the Num function. Some of those rows are integers, some are in thousands, some are money and some are percentages.
  • Because some of the expressions referred to other rows and columns (using the Above and RangeSum functions), I had to turn off the ability to drag and drop the columns. I also had to lock in the selection of all KPI_TYPE values so that the user wouldn’t accidentally make selections on KPI_TYPE. If the expressions were not using functions like Above then allowing the user to make selections on KPI_TYPE would make the chart more flexible.
  • Note that some of the KPI_TYPE values can be labels and the corresponding row in the expression is just a blank line. (Make sure the chart properties are not going to hide a row of all blanks.)
  • As Aaron mentions in a comment below, something similar can be achieved with a synthetic dimension using ValueList. I think ValueList is better suited to a small number of values but it offers the advantage that a user can't accidentally select one of the values by clicking on the chart; a downside is that it may consume more resources (versus an actual field dimension) when used with a large data model.

★★★