Tool of Thought

APL for the Practical Man

DOM via JSON Performance

April 12, 2023

In the previous post we looked at building an APL DOM from a ⎕XML-style matrix using ⎕JSON rather than recursively creating namespaces by hand. Now let's look at some performance characteristics.

First let's create a relatively large document and convert it to HTML:

            v←25000 20⍴⍕¨⍳500000
            t←A.NewTable v
            c←A.Cells t
      d←LargeDoc 0
      h←A.DOM2HTML d 

Now let's convert the HTML to a ⎕XML matrix, and test the performance of the two techniques from the previous post for converting a ⎕XML matrix to a ⎕JSON matrix:

  xm←⎕XML h
      cmpx 'XM2JM_NoLoop xm' 'XM2JM_Loop xm'
  XM2JM_NoLoop xm → 5.2E¯1 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕                                
  XM2JM_Loop xm   → 2.4E0  | +372% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

While it's no surprise that the non-looping algorithm is faster, it is remarkable that the looping technique is not that bad. In addition no attempt has been made to sqeeze any more performance out of the non-looping function, other than the initial act of removing the loop, so there may be more speed to be had.

Of more interest is the relative performance of the complete task of using ⎕JSON verses recursion to build the DOM:

                ⎕JSON(⎕JSON⍠'M')XM2JM ⎕XML ⍵
      cmpx 'HTML2DOMviaJSON h'  'A.HTML2DOM h'
  HTML2DOMviaJSON h → 9.4E0  |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                   
* A.HTML2DOM h      → 1.8E1  | +91% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Note that the results of these two functions don't match as they are namespaces not arrays. Note further that ⎕JSON creates a tree where the child namespaces are true children of the parent, as opposed to the (this particular) recursive technique:

        d1←HTML2DOMviaJSON h
        d2←A.HTML2DOM h 
#.[JSON object].[JSON object].[JSON object].[JSON object]

Recursion is clearly slower, but more testing should be done on reasonably sized and more realistic documents, especially docs with more depth.