Tool of Thought

APL for the Practical Man

Validating Dates

March 27, 2023

One of the features of ⎕DT is that it validates timestamps and time numbers.

Verifying type 60 and 61 time numbers, as well as timestamps (⎕TS style) takes a bit of computation. Leap years must be determined, there are different day counts for different months, no more than 12 months in a year, 24 hours in a day, etc:

      60 0 ⎕DT 20230101.235959 20010229 20000229 20231301
1 0 1 0

For time numbers that are a (potentially fractional) number of days from some epoch, the testing is much simpler. In fact it is hard to come up with an invalid value when almost any input yields a valid date:

      1 0 ⎕DT ¯1 0 1 35654 12345.123456789123456789 123456789123456789.123
1 1 1 1 1 1    

There is no need to check much of anything except perhaps the range, which appears to be somewhere near:

      1 0 ⎕DT ¯2 2*22 62
1 1

For a relatively large array, Text2Date is taking an inordinate amount of time using ⎕DT to validate type 60 time numbers. To investigate, some slighty modified old code (from the pre ⎕DT era) that handles time numbers in the form YYYYMMDD.HHMMSS:

Val←{
     k←(0,5⍴100)⊤⍵×10*6
     f c←(1752 1 1 0 0 0)(4000 13 32 24 60 60)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     (k[5;]=⌈k[5;])∧g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 } 

For comparison, let's define ValDT:

ValDT←{
     60 0 ⎕DT ⍵
 }

And run on 50 million integer dates:

      d←18000000+?50E6⍴22E6
      cmpx 'Val d' 'ValDT d'
  Val d   → 5.2E0  |    0% ⎕⎕⎕⎕⎕⎕                                  
  ValDT d → 3.4E1  | +552% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

And Val has not been optimized. If we rewrite it for integers only:

ValInt←{
     k←(0,2⍴100)⊤⍵
     f c←(1752 1 1)(4000 13 32)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 }

Then we get:

      cmpx 'ValInt d' 'ValDT d'
  ValInt d   → 1.4E0  |     0% ⎕                                       
  ValDT d    → 3.8E1  | +2628% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Something is not right here. Maybe there is some check I'm not doing, but it looks like there is more going on than that.

Even more inexplicable is how long ⎕DT takes to validate a Dyalog Date Number when the array gets large. Unless I'm missing something, the check should be almost trivial. Maybe it is some sort of memory issue rather than computational inefficiency.

Grade Down of Grade Up

March 21, 2023

One of the amazing things about old APL books is that no matter how many years have passed, no matter how many new language features have been added, there are invariably many useful nuggets to be gleaned from them. A lot of very smart people used APL back in the day.

Consider the case of ⍒⍋ and its twin ⍒⍒. A while back on the APL Farm there was a discussion of the interpretation and usefulness of these combinations, which culminated with:

So the conclusion is that if we are only dealing in permutations ⍒⍋ is worthless since it's simply a slower , and if we are not dealing in permutations even Adám doesn't know what ⍒⍋ does so it is probably equally worthless.

I read this at the time with an uneasy feeling that somewhere, sometime, I had a use for ⍒⍋, but promptly forgot about the whole thing. And then just the other day I was refactoring a test for a function that computes the average rank. And lo and behold:

      averageRankUp←{0.5×(⍋⍋⍵)+⍒⍋⌽⍵}

and its sibling:

      averageRankDown←{0.5×(⍋⍒⍵)+⍒⍒⌽⍵}

I wonder if and how average rank could be computed as efficiently or succinctly without ⍒⍋. I know it can be computed using two outer products, but that is very expensive in time, space, and tokens. I also wonder if there are any other documented uses of these pairings. Given the discussion in the APL Farm where it was observed that ⍒⍋p is ⌽p for permutation p, it is probably no accident that this application of ⍒⍋ involves .

I was pretty sure I did not come up with these expressions on my own, and a short search of my library revealed that I had most likely lifted it from APL2 in Depth , a 1995 text by legends Norman Thompson and Ray Polivka. Don't get rid of those old APL books!

How Would You Write This?

March 5, 2023

This question was recently posed on the APL Orchard regarding how the following logic should or could be expressed given that an :OrIf cannot be mixed with an :AndIf:

       :If 'Linux'≡(⊃# ⎕WG'APLVersion')~'-64'
       :AndIf ~0∊⍴⎕CMD'which git'
       :OrIf 'Windows'≡(⊃# ⎕WG'APLVersion')~'-64'
       :AndIf ~0∊⍴⎕CMD'where git'

Leaving aside the :AndIf/:OrIf issue, this style violates the prime Dyalog Don't directive Don't use control structures, especially in such a situation or context.

A response was immediately forthcoming, cutting the Gordian knot:

:If ~0∊⎕CMD' git',⍨⊃'which' 'where'⌽⍨'Windows'≡'-64'~⍨⊃# ⎕WG'APLVersion'

This is a much more APL-style solution as it removes all the essential control structures, and is also better from a general programming perspective as it removes all of the duplicated code. These two improvements are directly related; removing the control structures requires the removal of the duplicated code. This is precisely why control structures should be avoided. But dangling this much code off the end of an :If statement, or preceeding a dfn guard, still leaves a lot to be desired, and violates another Dyalog Don't.

Why not name a function? How much easier on the reader's eyes and brain is it to see:

     :If IsInstalled 'git' 

or

     ~IsInstalled 'git':0 

There is no need for a comment, either outside the function or inside the function. The name of the function and its literal argument say it all. Furthermore, by extracting 'git' as an argument, we have a function that might be useful in many other places, and would make a good candiate for an item in a utility package. Even if the function is hard-coded in the application, and is only used once, the simple act of encapsulating and naming the code is very useful.

Once we have a well-named function, we might even say, who cares how the code is written! The fact that the code is in a function, that the function has a good name and a singular purpose, that the arguments are well-designed-- all of these things are more important than the style or technique of the code inside the function itself. It would be better to have the function named and then coded with all sorts of control structures than to have the naked APL-style solution following the :If statement.

The benefits of an application composed of small, well-named and designed functions cannot be overstated.

More posts...