Tool of Thought

APL for the Practical Man

Validating Dates

March 27, 2023

One of the features of ⎕DT is that it validates timestamps and time numbers.

Verifying type 60 and 61 time numbers, as well as timestamps (⎕TS style) takes a bit of computation. Leap years must be determined, there are different day counts for different months, no more than 12 months in a year, 24 hours in a day, etc:

      60 0 ⎕DT 20230101.235959 20010229 20000229 20231301
1 0 1 0

For time numbers that are a (potentially fractional) number of days from some epoch, the testing is much simpler. In fact it is hard to come up with an invalid value when almost any input yields a valid date:

      1 0 ⎕DT ¯1 0 1 35654 12345.123456789123456789 123456789123456789.123
1 1 1 1 1 1    

There is no need to check much of anything except perhaps the range, which appears to be somewhere near:

      1 0 ⎕DT ¯2 2*22 62
1 1

For a relatively large array, Text2Date is taking an inordinate amount of time using ⎕DT to validate type 60 time numbers. To investigate, some slighty modified old code (from the pre ⎕DT era) that handles time numbers in the form YYYYMMDD.HHMMSS:

Val←{
     k←(0,5⍴100)⊤⍵×10*6
     f c←(1752 1 1 0 0 0)(4000 13 32 24 60 60)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     (k[5;]=⌈k[5;])∧g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 } 

For comparison, let's define ValDT:

ValDT←{
     60 0 ⎕DT ⍵
 }

And run on 50 million integer dates:

      d←18000000+?50E6⍴22E6
      cmpx 'Val d' 'ValDT d'
  Val d   → 5.2E0  |    0% ⎕⎕⎕⎕⎕⎕                                  
  ValDT d → 3.4E1  | +552% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

And Val has not been optimized. If we rewrite it for integers only:

ValInt←{
     k←(0,2⍴100)⊤⍵
     f c←(1752 1 1)(4000 13 32)
     l←(k[1;]=2)∧(0=4|k[0;])=(0=100|k[0;])=0=400|k[0;]
     g←(k[2;]>0)∧k[2;]≤l+31 28 31 30 31 30 31 31 30 31 30 31[11⌊0⌈k[1;]-1]
     g∧∧⌿(f(≤⍤¯1)k)∧c(>⍤¯1)k
 }

Then we get:

      cmpx 'ValInt d' 'ValDT d'
  ValInt d   → 1.4E0  |     0% ⎕                                       
  ValDT d    → 3.8E1  | +2628% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Something is not right here. Maybe there is some check I'm not doing, but it looks like there is more going on than that.

Even more inexplicable is how long ⎕DT takes to validate a Dyalog Date Number when the array gets large. Unless I'm missing something, the check should be almost trivial. Maybe it is some sort of memory issue rather than computational inefficiency.