Tool of Thought

APL for the Practical Man

"You don't know how bad your design is until you write about it."

Validating Dates

March 27, 2023

One of the features of ⎕DT is that it validates timestamps and time numbers.

Verifying type 60 and 61 time numbers, as well as timestamps (⎕TS style) takes a bit of computation. Leap years must be determined, there are different day counts for different months, no more than 12 months in a year, 24 hours in a day, etc:

      60 0 ⎕DT 20230101.235959 20010229 20000229 20231301
1 0 1 0

For time numbers that are a (potentially fractional) number of days from some epoch, the testing is much simpler. In fact it is hard to come up with an invalid value when almost any input yields a valid date:

      1 0 ⎕DT ¯1 0 1 35654 12345.123456789123456789 123456789123456789.123
1 1 1 1 1 1    

There is no need to check much of anything except perhaps the range, which appears to be somewhere near:

      1 0 ⎕DT ¯2 2*22 62
1 1

For a relatively large array, Text2Date is taking an inordinate amount of time using ⎕DT to validate type 60 time numbers. To investigate, some slighty modified old code (from the pre ⎕DT era) that handles time numbers in the form YYYYMMDD.HHMMSS:

Val←{
     k←(0,5⍴100)⊤⍵×10*6
     f c←(1752 1 1 0 0 0)(4000 13 32 24 60 60)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     (k[5;]=⌈k[5;])∧g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 } 

For comparison, let's define ValDT:

ValDT←{
     60 0 ⎕DT ⍵
 }

And run on 50 million integer dates:

      d←18000000+?50E6⍴22E6
      cmpx 'Val d' 'ValDT d'
  Val d   → 5.2E0  |    0% ⎕⎕⎕⎕⎕⎕                                  
  ValDT d → 3.4E1  | +552% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

And Val has not been optimized. If we rewrite it for integers only:

ValInt←{
     k←(0,2⍴100)⊤⍵
     f c←(1752 1 1)(4000 13 32)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 }

Then we get:

      cmpx 'ValInt d' 'ValDT d'
  ValInt d   → 1.4E0  |     0% ⎕                                       
  ValDT d    → 3.8E1  | +2628% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Something is not right here. Maybe there is some check I'm not doing, but it looks like there is more going on than that.

Even more inexplicable is how long ⎕DT takes to validate a Dyalog Date Number when the array gets large. Unless I'm missing something, the check should be almost trivial. Maybe it is some sort of memory issue rather than computational inefficiency.