Parallel-by-construction tree manipulation in Haskell

Daniel Dvoryanov

Parallel-by-construction tree manipulation in Haskell

This research is inspired by the work of A. Hsu on parallel compilers in APL. His findings are interesting, although might be underappreciated due to the unpopularity and complexity of APL. Therefore, I attempt to not only replicate his results, but to provide a purely-functional, Haskell-based implementation for parallel compilation steps, such as flattening expressions or lifting functions. My implementation, although needs optimisation, provides an easy-touse functions for construction and manipulation of array-based representation of a tree. Moreover, with little to no effort a variety of functions may be used as standalone, especially general ones, such as innerProduct or selectRows. My implementation is available in GitHub via link: https://github.com/DanProgrammer1999/parallel-compiler-haskell

Информатика

Дипломы

Вуз: АНО ВО «Университет Иннополис»

ID: 60ea1cd3e4dde500016d2ef7

UUID: bf8619d0-c3fa-0139-3c8a-0242ac180005

Язык: Английский

Опубликовано: больше 3 лет назад

Просмотры: 27

10.74

Daniel Dvoryanov

Комментировать 1

Рецензировать 0

Скачать - 511,9 КБ

Поделиться работой

Автономная некоммерческая организация высшего образования «Университет Иннополис» ВЫПУСКНАЯ КВАЛИФИКАЦИОННАЯ РАБОТА (БАКАЛАВРСКАЯ РАБОТА) по направлению подготовки 09.03.01 - «Информатика и вычислительная техника» GRADUATION THESIS (BACHELOR’S GRADUATION THESIS) Field of Study 09.03.01 – «Computer Science» Направленность (профиль) образовательной программы «Информатика и вычислительная техника» Area of Specialization / Academic Program Title: «Computer Science» Тема / Topic Манипуляция параллельным по построению деревом в Haskell / Parallel-by-construction tree manipulation in Haskell Работу выполнил / Thesis is executed by Руководитель выпускной квалификационной работы / Supervisor of Graduation Thesis Дворянов Даниил Сергеевич / Dvorianov Danyil подпись / signature Маццара Мануэль / Mazzara Manuel подпись / signature Консультанты / Consultants Кудасов Николай Дмитриевич / Kudasov Nikolai Иннополис, Innopolis, 2021 подпись / signature

Автономная некоммерческая организация высшего образования «Университет Иннополис» ВЫПУСКНАЯ КВАЛИФИКАЦИОННАЯ РАБОТА (БАКАЛАВРСКАЯ РАБОТА) по направлению подготовки 09.03.01 - «Информатика и вычислительная техника» GRADUATION THESIS (BACHELOR’S GRADUATION THESIS) Field of Study 09.03.01 – «Computer Science» Направленность (профиль) образовательной программы «Информатика и вычислительная техника» Area of Specialization / Academic Program Title: «Computer Science» Тема / Topic Манипуляция параллельным по построению деревом в Haskell / Parallel-by-construction tree manipulation in Haskell Работу выполнил / Thesis is executed by Руководитель выпускной квалификационной работы / Supervisor of Graduation Thesis Дворянов Даниил Сергеевич / Dvorianov Danyil подпись / signature Полёткин Кирилл Владимирович / Poletkin Kirill подпись / signature Консультанты / Consultants Кудасов Николай Дмитриевич / Kudasov Nikolai Иннополис, Innopolis, 2021 подпись / signature

Contents 1 Introduction 9 1.1 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Different Approach to AST . . . . . . . . . . . . . . . . . . . . 11 1.3 My Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Literature Review 2.1 Compilers in Haskell . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 14 14 Functional quantum programming language . . . . . . . 14 2.2 The Key to a Data Parallel Compiler . . . . . . . . . . . . . . 15 2.2.1 Algorithm outline . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Key operator . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Haskell Array Computations . . . . . . . . . . . . . . . . . . . 20 2.4 Stack Package Tool . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Methodology 22 3.1 Technologies used . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Accelerate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Node coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 25

CONTENTS 4 3.6 Design and Architecture . . . . . . . . . . . . . . . . . . . . . 25 3.6.1 Node coordinates construction . . . . . . . . . . . . . . 25 3.6.2 Finding closest focus ancestors . . . . . . . . . . . . . . 27 3.6.3 Key operator . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Implementation 32 4.1 Project configuration . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 General functions . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.2 Select rows . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Node coordinates construction . . . . . . . . . . . . . . . . . . 35 4.4.1 Tree to array representation . . . . . . . . . . . . . . . 35 4.4.2 Depth vector to node coordinates . . . . . . . . . . . . . 36 4.5 Towards closest focus ancestors . . . . . . . . . . . . . . . . . . 38 4.5.1 Find nodes of type . . . . . . . . . . . . . . . . . . . . . 39 4.5.2 Parent matrix . . . . . . . . . . . . . . . . . . . . . . . 39 4.5.3 Closest focus ancestors . . . . . . . . . . . . . . . . . . 40 4.6 Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Evaluation and Discussion 44 5.1 Measuring Performance . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

CONTENTS 5 6 Conclusion 48 Bibliography cited 50

List of Tables I Depth Vector example . . . . . . . . . . . . . . . . . . . . . . . 17 II Construction of node coordinates . . . . . . . . . . . . . . . . . 18 III Node coordinates of nodes of type E (Expression) . . . . . . . . 18 IV Node coordinates of nodes of type E (Expression) . . . . . . . . 18 V Finding closest ancestor . . . . . . . . . . . . . . . . . . . . . . 19 VI Depth vector representation . . . . . . . . . . . . . . . . . . . . 28 VII Constructing node coordinates . . . . . . . . . . . . . . . . . . 29 VIII Coordinates of nodes of type ‘Expr’ . . . . . . . . . . . . . . . 29 IX Parent coordinates matrix . . . . . . . . . . . . . . . . . . . . 29 X Constructing closest focus ancestors matrix . . . . . . . . . . . 30 XI Result of key operator - using nested arrays . . . . . . . . . . . 31 XII Result of key operator - using segment descriptor . . . . . . . . 31

List of Figures 2.1 Example AST . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Node coordinates grouped by expression nodes . . . . . . . . . 19 3.1 Sample AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 AST Data Structure . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Inner product function . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Select rows function . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Padding to equal length code . . . . . . . . . . . . . . . . . . . 36 4.5 Tree to vector conversion code . . . . . . . . . . . . . . . . . . 37 4.6 Node coordinates construction code . . . . . . . . . . . . . . . 38 4.7 Find nodes of specified type . . . . . . . . . . . . . . . . . . . 39 4.8 Get parent coordinates function . . . . . . . . . . . . . . . . . 40 4.9 Function for finding closest ancestors of specified type . . . . . 40 4.10 2-dimensional key operator . . . . . . . . . . . . . . . . . . . . 43 4.11 N-dimensional key operator signature . . . . . . . . . . . . . . 43 5.1 Recursive algorithm for finding closest ancestors . . . . . . . . 45 5.2 Simple tree generation . . . . . . . . . . . . . . . . . . . . . . . 45

Abstract This research is inspired by the work of A. Hsu on parallel compilers in APL [1]. His findings are interesting, although might be underappreciated due to the unpopularity and complexity of APL1 . Therefore, I attempt to not only replicate his results, but to provide a purely-functional, Haskell-based implementation for parallel compilation steps, such as flattening expressions or lifting functions. My implementation, although needs optimisation, provides an easy-touse functions for construction and manipulation of array-based representation of a tree. Moreover, with little to no effort a variety of functions may be used as standalone, especially general ones, such as innerProduct or selectRows. My implementation is available in GitHub https://github.com/DanProgrammer1999/parallel-compiler-haskell 1 Wikipedia: https://en.wikipedia.org/wiki/APL (programming language) via link:

Chapter 1 Introduction In this research I took inspiration from the article of A. Hsu [1]. The author describes an efficient and novel approach to an important compilation step using APL. Aaron’s main idea is that the resulting program eliminates branching and recursion from parsing, which allows it to efficiently run on CPU or GPU. In my research, I attempt to not only replicate his approach in Haskell with Accelerate library, but to create a purely-functional and modular interface to an efficient array-based syntax tree manipulation. For this purpose, I provide a toolset of functions and operations on parallel-by-construction tree representation. To understand the importance of my research goal, let me first describe several concepts, closely related to the topic. 1.1 Trees Trees are one of the most important concepts in programming, second only to, probably, lists or arrays. They are used for a variety of things, such as

1.1 Trees 10 compilers, JSON parsing, fast searching, and many others. In general, a tree is a mathematical structure similar to graph [2]. In fact, it can be described as a simple, connected, acyclic, and undirected graph. On of the most common definition of a tree in computer science is a collection of nodes, where each node has a value, and a possibly empty list of children. This is a recursive definition, as each child is also a node, and can have another list of child nodes, and so on. Therefore, the most convenient and straightforward way to operate on a tree is using recursive algorithms. For example, a traditional AST is traversed and transformed multiple times during various compiler stages, such as flattening expressions, lifting functions, folding constants, and others. With traditional approaches, most of the traversals require recursion over the whole tree and conditional branching. However, both recursion and branching prevent compilers from efficient, parallel execution on multi-core CPU or GPU devices. Historically, Haskell is an excellent choice for dealing with trees, as well as other recursive structures. Haskell’s focus on recursion optimization, as well as convenient mechanisms of pattern matching, allow for straightforward and efficient tree manipulation algorithms. Moreover, since ASTs result from parsing a language, we can use an extensive Haskell tool set for implementing a parser, further streamlining the process. However, even though recursive approach is the most straightforward, it is not the most efficient one. Moreover, it is almost impossible to enable the recursive algorithm to run in parallel.

1.2 Different Approach to AST 1.2 11 Different Approach to AST Recursive structure is not the only way to represent a tree in software. Much like it is possible to describe a graph using adjacency matrix, a 2dimensional array, it is as well possible to come up with an efficient way to describe a tree using arrays. Depending on the purpose, there are a variety of ways to do so. In the case of AST, abstract syntax tree, A. Hsu [1] has designed a very efficient structure for storing and operating on a tree. He describes node coordinates, a matrix of values, where each row describes a unique address, almost a path, to a node. This allows to easily find out if one node is a sibling to the other, does it appear higher than the other, is it a direct or an indirect ancestor of the other, etc. By applying different transformations and operations on the node coordinates matrix, it is possible to perform several compiler stages, such as expression flattening, function lifting, and constants folding, without any recursion and branching. The algorithm consists of several steps: • convert a recursive representation to an array representation with 3 columns: node depth in the tree, and its type and value. A. Hsu [1] uses preorder traversal 1 for this. • convert depth vector to node coordinate matrix. This includes several steps: – construct boolean matrix from depth vector. That is, a matrix where 1 Wikipedia: https://en.wikipedia.org/wiki/Tree traversal

1.2 Different Approach to AST 12 each row corresponds to a depth, and has 1 at i = depth, and 0 everywhere else – compute cumulative sum along the columns of the matrix obtained in the previous step – finally, drop values (replace with 0) at each row where i > depth • find the node coordinates of the nodes we are interested in (e.g. in case of expression flattening, nodes with type ‘Expr’) • transform the node coordinates matrix into parent coordinates matrix, where each row would specify the coordinates of the node’s parent. It is easy to do: just replace with zero the last non-zero element at each row • using parent coordinates matrix, compute an inner product with the focus nodes, such that: for each row i, each element j is 1 if focus nodesj is an ancestor of the node coordinatesi , otherwise 0. Parent coordinates matrix is used to ensure that a node is not marked as an ancestor to itself • using the result of the previous step, find for each row the column index of the last non-zero element • the indices obtained in the previous step signify the closest ancestor of the desired type. Use this vector as indices into node coordinates matrix to construct the matrix of closest focus ancestors • finally, use the key operator to group the node coordinates by the closest focus ancestors

1.3 My Contribution 1.3 13 My Contribution In this research, I attempt to not only replicate the results of A. Hsu [1] for compiler stages, but to provide a declarative-style, purely-functional and easy to use interface to array-based operations on any tree. This work might be a beginning of creating purely-functional library of functions and operators in Haskell, which allows one to avoid dealing with complexities of array programming, while retaining efficiency and parallelisation capabilities of array programming language. Therefore, my research question is: is it possible to build purely functional, high-level Haskell interface to highly parallelizable array computational language (such as Accelerate)? And is it possible to do so without sacrificing efficiency?

Chapter 2 Literature Review In this chapter I enlist and review the work already done on parallel-byconstruction trees, as well as explain the purpose and novelty of my work. 2.1 Compilers in Haskell One of the reasons for choosing Haskell as a primary language for this research is that the language is often considered to be a great tool for writing a compiler. There exist many efficient easy to use libraries [3] for various stages of compilation. Moreover, the declarative style and expressivity of the Haskell language make it a good choice for writing compilers. 2.1.1 Functional quantum programming language One example of a compiler written in Haskell is Quantum programming language (QML) [4]. J. Grattage uses Haskell to implement a functional language for quantum computations. This language can be used to write functional-style programs and compile them to an annotated quantum circuit.

2.2 The Key to a Data Parallel Compiler 2.2 15 The Key to a Data Parallel Compiler Research that is of particular importance to this topic is described in arti- cle [1]. A. Hsu presents an approach to an inherently data-parallel compiler and provides the details of its implementation in an array programming language (APL). His idea is simple but powerful. Let me describe it briefly here with an example (fig. 2.1) from the article [1]. 2.2.1 Algorithm outline • Given an AST, construct a depth vector, as in fig. I • Construct node coordinate matrix, where each row is a unique coordinate for a node. This can be done in 3 steps (as shown by example in fig. II: – For a given depth vector, construct a corresponding boolean matrix with dimensions (node count X max depth). We can describe each row by the function of corresponding depth d: f (d ) = [if d = i then 1 else 0 | i ∈ [0, max depth)]. You can see this matrix for our example in part 1 of II – Accumulate the values along the column. In other words, in each 0 new column j , X0,j = X0,j , and Xi0,j = Xi −1,j + Xi0,j . You can see the result in part 2 of II – Finally, we need to fix some inaccuracies introduced by the previous step. Since each row is a unique address, and can be understood as a path from root to the node, each row needs not to be longer than the depth of the corresponding node. In other words, ∀ i ∈

2.2 The Key to a Data Parallel Compiler 16 [0, node count), j ∈ (depth vec[i], max depth) : Xi ,j = 0. You can see the result in part 3 of II • Find the coordinates of focus nodes (e.g. either expression or function nodes). See fig. III for given example. • Transform node coordinates so that each coordinate represents the coordinate of its parent. It’s easy to do: just set to 0 the last non-zero element in the row. See fig. IV. • Next important part is finding the closest ancestor from focus nodes. See fig. V for the example of described steps. – First, we construct a new matrix, where each row i is a boolean vector, each element j denotes whether the focus nodesj is an ancestor of the node i. In other words, Xi ,j = and [for k ∈ [0, max depth) : parent nodesi ,k == focus nodesi ,k || parent nodesi ,k == 0], where and simply folds an array with logical ‘and’. – In the resulting matrix, the closest ancestor’s index (in focus nodes) is the index of the rightmost non-zero element. To get it, we can replace each non-zero element with its index, and then simply retrieve the maximum for each row. We get an array of indices into the focus nodes matrix. – Finally, construct the matrix of closest ancestors by indexing the focus nodex matrix with the array from previous step. • Using the ancestor matrix, group nodes with the same ancestor. (see example in fig. 2.2)

2.2 The Key to a Data Parallel Compiler 17 Figure 2.1: Example AST Depth Type Value 0 E v 1 E 0 2 V a 2 P + 2 V b 1 P / 1 E 0 2 V c 2 P * 2 V d Table I: Depth Vector example • Now that we have grouped the whole tree into several independent chunks, we can give names to these subgroups and reference them in other subgroups. This helped to reduce the nesting and avoid recursion or branching (which perform poorly on GPU). 2.2.2 Key operator In his work, A. Hsu emphasizes the Key operator [5] as central to his strategy. This operator was first introduced in J language [6], and subsequently implemented in some other programming languages, such as APL. The idea is straightforward: given any 2 arrays a, b of the same outer dimension, and a

2.2 The Key to a Data Parallel Compiler 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 2 3 3 3 3 0 0 1 2 3 3 3 4 5 6 18 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 2 3 3 3 3 0 0 1 2 3 0 0 4 5 6 Table II: Construction of node coordinates 1 0 0 1 1 0 1 3 0 Table III: Node coordinates of nodes of type E (Expression) 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 3 3 3 0 0 0 0 0 0 0 0 0 0 Table IV: Node coordinates of nodes of type E (Expression)

2.2 The Key to a Data Parallel Compiler 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 19 0 0 1 1 1 0 0 3 3 3 0 0 0 0 0 0 0 0 0 0 Table V: Finding closest ancestor Figure 2.2: Node coordinates grouped by expression nodes

2.3 Haskell Array Computations 20 function f, find all equivalent rows in a, and However, Haskell Accelerate library does not provide a built-in key operator, so I will attempt to implement it in my project. 2.3 Haskell Array Computations There are many Haskell libraries which implement efficient array data structures. The most popular choices are Accelerate [7] and REPA [8] libraries. Their API is similar, and they both present an embedded language (e.g. a language with identical or similar to Haskell syntax, but with custom compilers). REPA provides a compiler which targets CPU execution, while Accelerate provides compilers which target both CPU1 and GPU2 among others. Since Accelerate offers more flexibility and more targets for compiling, I decided to choose it as the main Haskell library for the research. 2.4 Stack Package Tool To simplify development and managing of multiple packages, I decided to use Stack [9] package tool. Among other features, it allows to create new project with boilerplate files included, automatically install GHC in an isolated location and build the project from specification file. The main configuration file is package.yaml, which allows developers to specify project meta information, such as author’s name or license type, as well as specify various parameters for different parts of the project, such as dependencies, GHC options, and others. 1 2 http://hackage.haskell.org/package/accelerate-llvm-native accelerate-llvm-ptx

2.4 Stack Package Tool 21 By default, Stack divides the codebase into three logic parts: library, executables, and tests. Library files are located by default in src folder and contain the main logic of the program, while executable files are found under App folder and serve as an entry point for the application. The executables could also contain simple command line or graphical interface logic. Test files are located under test folder; however, I did not implement any test suite for this project.

Chapter 3 Methodology In this chapter I will describe the main architectural decisions, challenges, and technologies used for implementation of the system. 3.1 Technologies used In the course of this study, I used several technologies to implement and operate on various data structures. The main technologies behind the implementation are Haskell and Accelerate library. Their usage will be further described in details. I also used Haskell project management tool Stack [9] to simplify package management and reuse best practices and structures to work with a Haskell project. 3.2 Haskell Since the goal in this research is to replicate and benchmark the results achieved by A. Hsu [1] in Haskell, its usage is the main focus of implementation.

3.3 Stack 23 Similarly to some other programming languages like C++ or Python, Haskell is not a concrete implementation or system by itself. Instead, it is a language standard [10] that must be followed for any implementation. However, even though various implementations of Haskell [11] exist, the GHC compiler [12] is considered a de-facto standard state-of-the-art compiler traditionally associated with the Haskell language. For this project, I used the latest release to date, ghc-8.10.3. 3.3 Stack The main file of the implementation is src/AccTree.hs, which contains the main logic of the program. Since Accelerate redefines most standard functions and operators, which are included in Prelude and are implicitly imported in all files, Accelerate documentation 1 suggests to explicitly import Prelude as P with qualified keyword. This means that all Prelude functions, operators, and structures will not be added to the namespace, but will be accessible via P qualifier (e.g. P.map or P.==). Another file, sec/Tree.hs, contains basic definition, as well as Functorinstance implementation and several utility functions for Treetype. Additionally, a few utility functions are contained in src/Utils.hs file. I did not define them in src/AccTree.hs simply because qualified Prelude namespace makes the definition of the utility functions much more cumbersome and harder to read and understand. Finally, files Demo.hs and Benchmarks.hs contain various functions for testing and benchmarking, which will be described in detail in the next chapter. 1 https://hackage.haskell.org/package/accelerate-1.3.0.0/docs/Data-Array-Accelerate.html

3.4 Accelerate 3.4 24 Accelerate Accelerate is an embedded array language for high performance in Haskell [7]. "Embedded" means that the language has separate standalone compilers. Indeed, the package provides several backends for compiling to target several platforms, in particular, CPU and GPU. The idea behind the language is simple. All data are represented as arrays of subtypes of Elt element and include signed and unsigned Int, Bool, Char, as well as tuples and shapes. The arrays have fixed size which is required to specify upon their creation. The size of an array, its shape, and an index to it are all specified using the so-called snoc-list that is just a sequence of elements starting with the zeroth element Z, and separated with :. operator (e.g. Z :. 1 :. 2 :. 3 . . . ). More interestingly, this structure serves both as a data structure to define array sizes and construct indexes, and as a type constructor to specify an array type. As such, array dimensionality is always specified by a type signatures, and is checked at compile time. To perform any operation on an array, one must first make the data accessible to accelerate via function use. Depending upon which backend is used to eventually execute array computations, use may entail data transfer. This function also "wraps" an array in Acc type, which is required for all array manipulation functions of Accelerate. As an Embedded language, an Accelerate program is not compiled by GHC. Instead, each Accelerate backend is a runtime compiler which generates and executes parallel code of the target language at application runtime using function run. So, each operation on an array is combined, optimized, and then executed at runtime.

3.5 Node coordinates 3.5 25 Node coordinates Node coordinates matrix is a central data structure in parallel compiler implementation. The matrix consists of multiple rows, where each row, node coordinate, is a vector of natural numbers with length equal to the depth of the tree. Every node in the tree is uniquely represented by a node coordinate vector, and each such vector is zero-padded on the right, with the number of non-zero elements equal to the depth of the tree. Node coordinate matrix is lexicographically ordered and has dimensions NxD, where N is the total number of nodes in the tree, and D is the depth of the tree. In other words, each subsequent row in matrix differs from its predecessor by only one non-zero element. One important property, which is not observed in recursive implementation, is that every node coordinate is a prefix of each of its descendant, ignoring zeros; and conversely, each child node contains a number of sub-sequences, each of which points to its ancestor. This means that it is possible to identify such node relations as siblings, ancestors, and descendants in constant time. 3.6 Design and Architecture 3.6.1 Node coordinates construction As described above, several steps are needed to group the nodes using node coordinates. To better illustrate those steps, consider a sample expression (fig. 3.1) with the AST (fig. 3.1), and observe how it is transformed step by step. For the purposes of this research, I simply treat AST as the input, already built

3.6 Design and Architecture 26 somewhere else. This treatment allows for flexibility to not assume a specific syntax, specific AST, or even specific purpose: in fact, this approach can be used for something other then compiler construction. (x ∗ 2 + 1)/(3 + y) (3.1) Figure 3.1: Sample AST 1. Step 0: the tree is converted to vector form, so that each node is represented by its depth, type, and value (fig. VI). The resulting type and value arrays are padded to have the same length, and the data is converted to accelerate arrays 2. Step 1: creating boolean vector matrix, where each for each depth matrix coordinate i elementi ,depth vector [i ] = 1, and all other values are 0.

3.6 Design and Architecture 27 3. Step 2: accumulating values in columns of matrix from the previous step, 4. Step 3: removing extra numbers, so that count of non-zero elements in each node coordinate corresponds to the depth of that node. Fig. VII shows these steps. 3.6.2 Finding closest focus ancestors Another important algorithm is used to find closest ancestors of certain type, focus nodes. Similarly to node construction, this algorithm also consists of several parts. However, these parts are independent and can be used separately for other purposes. 1. Find focus nodes given their type (‘Expr’ in the example). Result for the sample tree is illustrated in VIII. 2. Construct parent matrix. This matrix is used to ensure that a node is not marked as an ancestor to itself. As mentioned in 3.5, each node coordinate contains parent coordinates as prefixes. Therefore, to accomplish this step, simply set the last non-zero element in each row to zero. Parent matrix for sample AST is illustrated in IX 3. Construct matrix of closest focus ancestors, where each rowi is a node coordinate of closest focus ancestor of nodei . This involves can be done in several steps, described below and illustrated in X: • Produce and inner product between parent coordinate matrix and focus node matrix with boolean function that tests whether a vector is a prefix to the other vector, ignoring zeros. In other word, each

3.6 Design and Architecture 28 0 1 2 3 4 5 6 6 4 2 3 4 4 Expr Op Expr Op Expr Op Var Num Num Expr Op Num Var / + * x 2 1 + 3 y Table VI: Depth vector representation elementi ,j of such matrix will specify whether or not jth focus node is an ancestor of ith node. • Construct an vector of indexes to focus nodes, which specifies closest ancestor of each node from parent matrix. To do that, one can simply replace non-zero elements in the matrix from previous step by their column index, and take maximum in each row. • Finally, using the vector of indexes obtained in the previous step, construct a matrix where where each row specifies the closest ancestor from focus nodes for the corresponding node 3.6.3 Key operator As mentioned in 2, key operator is crucial as the last step of Hsu’s algorithm. Ideally, after applying key operator to closest focus ancestors and node coordinates matrix, one gets result similar to the one in XI. However, this data

3.6 Design and Architecture 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 (a) Step 1 29 0 0 1 1 1 1 1 1 1 2 2 2 2 0 0 0 1 1 1 1 1 1 1 2 2 2 0 0 0 0 1 1 1 1 2 2 2 3 4 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 (b) Step 2 0 0 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 0 1 1 2 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 Table VIII: Coordinates of nodes of type ‘Expr’ 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 2 2 2 0 0 0 0 1 1 1 1 1 0 0 2 2 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 2 0 0 3 4 (c) Step 3 Table VII: Constructing node coordinates 1 1 1 1 0 0 0 1 1 1 1 1 1 0 2 2 2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Table IX: Parent coordinates matrix 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0

3.6 Design and Architecture 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 (a) Inner product of parent matrix with focus nodes 30 0 0 0 1 1 2 2 2 1 0 3 3 3 (b) Step 2: Indexes to focus node matrix 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 0 2 2 2 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (c) Step 3: Matrix of closest focus ancestors Table X: Constructing closest focus ancestors matrix structure is impossible in accelerate: since each grouped segment has different length, and accelerate does not allow nested arrays. One solution to this problem is used throughout accelerate itself: simply return a tuple with segment descriptor2 as a second element. Segment descriptor is simply a vector which describes length of each specific segment in the return array. So the result looks similar to XII. 2 https://hackage.haskell.org/package/accelerate-1.3.0.0/docs/Data-Array-Accelerate.html#t:Segments

3.6 Design and Architecture 1000000 1110000 1111100 1200000 31 0 1 2 2 3 4 4 5 6 6 3 4 4 Expr Op Expr Expr Op Expr Num Op Var Num Op Num Var / + 1 * x 2 + 3 y 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 2 1 1 1 1 1 1 2 2 2 0 0 0 0 1 1 1 1 1 1 2 2 2 0 0 0 0 0 1 2 1 1 1 0 3 4 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 Table XI: Result of key operator - using nested arrays 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 2 1 1 1 1 1 1 2 2 2 0 0 0 0 1 1 1 1 1 1 2 2 2 0 0 0 0 0 1 2 1 1 1 0 3 4 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 3 2 5 3 Table XII: Result of key operator - using segment descriptor

Chapter 4 Implementation This chapter will further describe and dive into details of implementation. 4.1 Project configuration As mentioned above, this project uses Stack [9] package manager. It allows to specify all project-related configuration details in package.yaml, and specify extra dependencies in stack.yaml. Dependencies for this project include accelerate, along with acceleratenative for CPU targeting, and criterion 4.2 1 for benchmarking. Types In the previous chapter, it was established that the input to our system is an AST. Therefore, I implemented several simple data structures to represent them, along with Functor, Foldable, and Show instances to simplify various operations on trees. You can see the type specifications in fig. 4.1. Instance im1 https://hackage.haskell.org/package/criterion

4.3 General functions 33 data Tree a = Tree { treeRoot :: a , treeChildren :: [Tree a] } data ASTNode = ASTNode { nodeType :: String , nodeValue :: String } type AST = Tree ASTNode Figure 4.1: AST Data Structure plementation is intentionally omitted, since it is mostly trivial and not essential to the solution. 4.3 General functions Throughout the implementation several patterns and operations emerge. These operations are common enough to be abstracted to separate general functions. Moreover, implementing separate functions for common operations allows for easier optimization. That is, instead of optimizing the same operation throughout the codebase, which could possibly be implemented in a variety of ways, one simply needs to focus on one function to improve overall efficiency of a library. This section lists and describes such functions in my codebase. 4.3.1 Inner product First common function is inner product. It is similar to matrix product2 , but instead of multiplying rows by columns, it multiplies rows by rows. Also, 2 https://en.wikipedia.org/wiki/Matrix multiplication

4.3 General functions 34 innerProduct :: (Elt a, Elt b, Elt c) => (Exp a -> Exp b -> Exp c) -> (Exp c -> Exp c -> Exp c) -> Acc (Matrix a) -> Acc (Matrix b) -> Acc (Matrix c) innerProduct prodF sumF a b = fold1 sumF $ zipWith prodF aExt bExt where (I2 r1 _) = shape a (I2 r2 _) = shape b aExt = replicate (lift (Z :. All :. r2 :. All)) a bExt = replicate (lift (Z :. r1 :. All :. All)) b Figure 4.2: Inner product function instead of multiplication for elements and summation for rows, the two functions can be given as parameters, first is a product function (how to combine 2 elements from 2 matrices), and second is sum function (how to combine the row of results into a single element). That is, given two matrices Am,n :: Matrixa and Bk ,m :: Matrixb, and two combination functions f :: a− > b− > c, and g :: c− > c− > c, an element in resulting matrix is obtained by combining rows Ai and Bj with f, and folding the resulting vector with g. In practice, since Accelerate does not facilitate easy operations on segments, one can implement inner product in Accelerate by transforming matrices to the same dimensions by replication. See the implementation in fig. 4.2 for more details. 4.3.2 Select rows Another useful general function is selectRows. The logic is very simple: given a matrix, and a vector of indices, return a new matrix with rows selected by the index vector. But since the index vector is constrained only by maximum number it can contain, number of matrix rows - 1, selectRows has multiple

4.4 Node coordinates construction 35 selectRows :: (Elt a) => Acc (Vector Int) -> Acc (Matrix a) -> Acc (Matrix a) selectRows rowIndex arr = zipWith (\i j -> arr ! I2 i j) rowIndexMat colIndexMat where nResultRows = size rowIndex (I2 _ nCols) = shape arr rowIndexMat = replicate (lift (Z :. All :. nCols)) rowIndex colIndexMat = replicate (lift (Z :. nResultRows :. All)) $ enumFromN (I1 nCols) (0 :: Exp Int) Figure 4.3: Select rows function applications, such as selecting a subset of matrix rows, replicating rows into a bigger matrix with a given order, and simply rearranging the existing rows. The implementation is quite simple: replicate index vector as a column, replicate column indices as a row, and zip the two matrices with function to index from the original array. See the implementation in fig. 4.3. 4.4 Node coordinates construction 4.4.1 Tree to array representation To be able to work with this data in Accelerate, we first need to convert it to arrays. There are several steps involved. First, one has to transform tree to depth vector representation. This involves several steps. First, the tree needs to be transformed into depth vector representation. It can be accomplished via a simple recursive function. See Second step pertains to transforming types and values data. Since Accelerate does not support String type, the only solution is to use Char matrix for storing both types and values of nodes. But this requires all strings in a matrix to be of the same length, which is not guaranteed in case of types of values. Therefore, in order to enable this conversion, we first need to make all types and values to be of equal length by padding them with some

4.4 Node coordinates construction 36 padRight :: Int -> a -> [a] -> [a] padRight n e arr | length arr == n = arr | otherwise = arr ++ replicate (n - length arr) e padToEqualLength :: a -> [[a]] -> [[a]] padToEqualLength e arr = map (padRight maxLength e) arr where maxLength = maximum $ map length arr Figure 4.4: Padding to equal length code character. I chose ‘ 0’, since it is guaranteed to not be used inside the strings themselves. For this, I implemented simple functions in file Util.hs to avoid namespace conflicts. See fig. 4.4 for its implementation. Function padRight pads an array with the given symbol to the specified length, while padToEqualLength takes an array of arrays (e.g. array of Strings, since String is equivalent to [Char]) and uses the first function to pad all arrays to equal length. After the padding is done, convert depth, types, and values lists to arrays. It can be done using fromList function of Accelerate library. See the complete implementation of the described steps in fig. 4.5. 4.4.2 Depth vector to node coordinates The next transformation is construction of node coordinates, which was explored in depths in section 3.6.1. Implementation simply aims to replicate the steps in accelerate language. See fig. 4.6 for the code. Several points need clarifications, though. Scan function in Accelerate3 is similar to scan in 3 https://hackage.haskell.org/package/accelerate-1.3.0.0/docs/Data-Array-Accelerate.html#g:35

4.4 Node coordinates construction 37 treeToVectorTree :: AST -> VectorTree treeToVectorTree tree = (depthAcc, typesAcc, valuesAcc) where treeToList currLevel (Tree root children) = (currLevel, nodeType root, nodeValue root) : P.concatMap (treeToList (currLevel + 1)) children listTree = treeToList 0 tree depthVector = P.map (\(d, _, _) -> d) listTree types = padToEqualLength '\0' $ P.map (\(_, t, _) -> t) listTree values = padToEqualLength '\0' $ P.map (\(_, _, v) -> v) listTree maxLength arr = P.maximum (P.map P.length arr) depthAcc = fromList (Z :. P.length depthVector) depthVector typesAcc = fromList (Z :. P.length types :. maxLength types) (P.concat types) valuesAcc = fromList (Z :. P.length values :. maxLength values) (P.concat values) Figure 4.5: Tree to vector conversion code pure Haskell4 , but it runs along the innermost dimension of accelerate arrays. That is, if array is 2-dimensional, each row will be scanned separately. Function scanl1 is a variant of scan without using an initial value. Function the ‘unpacks’ a value of single-element array, returned by maximum function in this case. Function generate is important as one of the ways to create an accelerate array. The function allows to construct an array by specifying the shape and indexto-value mapping for the new array. The final transformation step is to simply transfer node coordinates, types, and values into an Accelerate value. To convert array to accelerate, one can simply apply function use to the array. Depending on chosen backend, this may entail transfer to a CPU or GPU. 4 https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-List.html#g:6

4.5 Towards closest focus ancestors 38 constructNodeCoordinates :: Acc (Vector Int) -> Acc (Matrix Int) constructNodeCoordinates depthVec = nodeCoordinates where maxDepth = the $ maximum depthVec nodeCount = size depthVec depthMatrix = generate (I2 nodeCount (maxDepth + 1)) (\(I2 i j) -> boolToInt (depthVec !! i == j)) cumulativeMatrix = transpose $ scanl1 (+) $ transpose depthMatrix dropExtraNumbers (I2 i j) e = boolToInt (j <= depthVec !! i) * e nodeCoordinates = imap dropExtraNumbers cumulativeMatrix Figure 4.6: Node coordinates construction code 4.5 Towards closest focus ancestors Finally, with tree represented as Accelerate data, one could apply various transformations to it. If the goal is grouping expressions, the next steps are the following: • Find focus nodes, that is, nodes of some particular type useful for compiler optimization, such as Expr • Construct parent matrix - a matrix where each row is a parent of the corresponding node • Compute inner product using matrices from previous steps; find largest index of non-zero element in each row, and use the resulting vector to select rows from focus nodes matrix

4.5 Towards closest focus ancestors 39 findNodesOfType :: [Char] -> NCTree -> Acc (Matrix Int) findNodesOfType query (T3 nc types _) = selectRows selector nc where queryAcc = use (fromList (Z :. P.length query) query) -- either index is out of bounds, and the element is zero, or elements match isValidCharacter (I2 _ j) c = (j >= size queryAcc && c == constant '\0') || c == queryAcc ! I1 j selector = afst $ filter (>= 0) $ imap (\(I1 i) v -> boolToInt v * (i + 1) - 1) $ and (imap isValidCharacter types) Figure 4.7: Find nodes of specified type 4.5.1 Find nodes of type This operation is quite easy: simply compare each row element-byelement with the query, summarise each row with and operation, obtaining boolean vector that specifies which rows match the query, and which do not. From there, replace each non-zero element with column index, filter for nonzero elements, and finally, use the resulting vector to select node coordinates. See the implementation in fig. 4.7. 4.5.2 Parent matrix The second input to generating closest focus ancestor matrix is parent matrix: matrix of parents of each corresponding node coordinate. Due to the properties of node coordinates, one can simply replace with zero the last non-zero element to obtain the parent matrix. And this is exactly what getParenCoords function accomplishes. See the implementation in fig. 4.8.

4.5 Towards closest focus ancestors 40 getParentCoordinates :: Acc (Matrix Int) -> Acc (Matrix Int) getParentCoordinates nc = imap (\i e -> boolToInt (not (isReplacedWithZero i)) * e) nc where (I2 _ nCols) = shape nc isReplacedWithZero (I2 i j) = j + 1 == nCols || nc ! I2 i j == 0 Figure 4.8: Get parent coordinates function findAncestorsOfType :: [Char] -> NCTree -> Acc (Array DIM2 Int) findAncestorsOfType query tree@(T3 nc _ _) = selectRows closestAncestorIndexVec focusNodes where focusNodes = findNodesOfType query tree parentCoords = getParentCoordinates nc closestAncestorIndexVec = fold1 max $ imap (\(I2 i j) e -> boolToInt e * j) (innerProduct (\a b -> a == b || b == 0) (&&) parentCoords focusNodes) Figure 4.9: Function for finding closest ancestors of specified type 4.5.3 Closest focus ancestors Finally, after finding focus nodes and obtaining parent matrix, it is possible to compute matrix of closest focus ancestors. To accomplish this, it is required to compute inner product of parent coordinates matrix with focus nodes matrix, using (==) operator as product function for combining two elements of different matrices, and (&&) as sum function to fold a row of results into a single value. As a result, a boolean vector matrix is produced, where each elementi ,j specifies whether or not row i of parent matrix is a prefix of node coordinate of focus node j . Replacing each true element with column index, each false with 0, and selecting maximum in each row, one obtains a vector of indexes of focus nodes, which are closest focus ancestors of the corresponding nodes. Using selectRows, one finally obtains a matrix of closest focus nodes for each node. See the implementation in fig. 4.9.

4.6 Key 4.6 41 Key Key is an important part of the algorithm of A. Hsu. He relies on a native APL implementation of the key operator of an easy and efficient grouping. Unfortunately, Accelerate library does not provide such an operator, so it was my task to implement it. However, due to the complexity of Accelerate and limited time, I was only able to only partially implement a 2-dimensional version of the key operator. While the key operator by definition groups the values matrix by the equal rows in key matrix, applying the given function each row of keys and values to get the resulting row, my implementation currently only functions as a grouping function. In other words, while the operator accepts a row transformation function, it currently remains unused. It was the hardest hard to implement, since accelerate does not provide any tools for operations on rows, only either on the values themselves, or on the array in general. Of course, there are functions like scan and fold, but those operate on elements rather than on rows. On the other hand, grouping part works and produces correct results. Once again, it involves both innerProduct and selectRows functions. To implement grouping in key, one might begin by finding equal rows in keys. This can be done simply by computing inner product of key with itself, using (==)) as product function, and (&&) as sum function. As result, a matrix of groups is generated where each elementi ,j is true if the row i is equal to the current row. In other words, the rows in the matrix of groups show which other rows are equal to the current one. Next, one can replace the true elements with column number, and fold the matrix by selecting minimum non-zero number. Finally, a vector is generated

4.6 Key 42 which contains possibly repeating indexes of unique rows. But it can easily be fixed to contain unique numbers. Since each repeating row in the matrix of groups is equal, and the first element in the first encounter of a unique row is the index of the row itself, the resulting vector, where we select minimum non-zero index of row, has an interesting property: the first encounter of a number n is necessarily at index n. That is, to obtain a vector of indices of unique rows in matrix of groups, one simply needs to fold each row to the minimum non-zero value, replace in the resulting vector each value that is different from its index by -1, and filter the result with (>= 0) function. Using the obtained vector, one can select unique rows from groups matrix, and filter non-zero elements. As a result, a vector is generated which contains the arrangement of rows in vectors array. Moreover, as a second argument of tuple, filter returns segment descriptor, which in this case describes a number of value rows in each group. Finally, one can simply select rows using the earlier obtained vector. See the implementation in fig. 4.10. Even though I was not able to implement n-dimensional key operator, I have designed the type signature for it. See the type signature in fig. 4.11.

4.6 Key 43 key2 :: (Elt k, Elt v, Elt r, Eq k) => (Acc (Vector k) -> Acc (Vector v) -> Acc (Vector r)) -> Acc (Matrix k) -> Acc (Matrix v) -> Acc (Matrix v, Segments Int) key2 _ keys vals = T2 (selectRows selectors' vals) descriptor where (I2 nKeysRows nKeysCols) = shape keys -- each row contains indexes of equal rows -- groupsMatrix: nKeysRows x nKeysRows groupsMatrix = imap (\(I2 _ j) v -> boolToInt v * (j + 1) - 1) $ innerProduct (==) (&&) keys keys -- if a or b is zero, min a b is zero, and if not, (a == 0 || b == 0) is zero chooseMinId a b = if a < 0 || b < 0 then max a b else min a b uniqueRowsIdxVec = afst $ filter (>= 0) $ imap (\(I1 i) n -> boolToInt (i == n) * (n + 1) - 1) $ fold1 chooseMinId groupsMatrix (T2 selectors' descriptor) = filter (> 0) $ selectRows uniqueRowsIdxVec groupsMatrix Figure 4.10: 2-dimensional key operator key :: (Shape sh, Shape sh', Elt k, Elt v, Elt r) => (Acc (Array sh k) -> Acc (Array sh' v) -> Acc (Array sh' r)) -> Acc (Array (sh :. Int) k) -> Acc (Array (sh' :. Int) v) -> Acc (Array (sh' :. Int) r, Vector Int) key f keys vals = undefined Figure 4.11: N-dimensional key operator signature

Chapter 5 Evaluation and Discussion 5.1 Measuring Performance To measure the performance of the system, I used a popular Haskell library for benchmarks Criterion1 . It allows a simple but powerful way to measure efficiency of code. To compare an array-based implementation, I implemented Functorand Foldableinstances of a tree using recursion, which allowed to easily manipulate a tree and create recursive analogs of accelerate-based operations. The implementation is mentioned in chapter 4. Using instances, I created a simple recursive algorithm for finding closest focus ancestor. See the implementation in fig. 5.1 Finally, I needed a large enough data sample to avoid statistical errors. Therefore, I implemented a simple function which, given a number of levels, creates a simple tree of sum expressions. See the implementation in fig. 5.2. 1 https://hackage.haskell.org/package/criterion

5.1 Measuring Performance 45 findAncestorsOfTypeRec :: AST -> [Char] -> [(ASTNode, ASTNode)] findAncestorsOfTypeRec tree targetType = let root = treeRoot tree in findClosestAncestors tree root [(root, root)] where findClosestAncestors (Tree root []) _ res = res findClosestAncestors node@(Tree root children) currClosest res = mconcat [ newRes, mconcat $ map (\node -> findClosestAncestors node newClosest []) children ] where newClosest = if nodeType root == targetType then root else currClosest newRes = res ++ zip (map treeRoot children) (repeat newClosest) Figure 5.1: Recursive algorithm for finding closest ancestors buildNLevelAST :: Int -> Int -> AST buildNLevelAST width n | n <= 0 = astLeafNode "Num" "42" | otherwise = Tree (ASTNode "Expr" "") [ Tree (ASTNode "Op" "+") (replicate width (buildNLevelAST width (n - 2))) ] Figure 5.2: Simple tree generation

5.2 Comparison 5.2 46 Comparison 5.2.1 Interface One of the goals of this research was to create a purely-functional Haskell interface to array-based computations. The implementation proposed by A. Hsu [1] is efficient and important to the development of compilers, but it has a problem: the programming language that the author uses, APL2 , while efficient and concise, is not general purpose, and hard to understand for those who have not worked with it. The language uses a large variety of characters to represent operations. This might impose a negative effect on the true appreciation of his findings. Therefore, in my code I tried to amend this problem by providing a solution which uses a more readable and general purpose programming language, Haskell, as well as providing a purely-functional interface to all operations. As a result, one needs to simply define or provide an AST to use any of the defined functions as easily as writing a pure Haskell code. 5.2.2 Performance As mentioned above, I have created benchmarks for testing accelerate implementation against a recursive one. Unfortunately, the accelerate implementation is much less efficient than the recursive one. Benchmarks show the recursive implementation to be a factor of 1000 times more efficient (16 ms. against 14 ns.). But this does not mean that the approach itself is wrong. There are multiple reasons for the lack of efficiency, such as 2 https://www.dyalog.com/what-is-dyalog.htm

5.3 Future work 47 • Lack of time and knowledge on my part led to inefficient implementation of various functions • Accelerate documentation does not mention efficiency of the functions, so there may be some functions which are more efficient, and some functions which should not be used, or used as less as possible • I benchmark finding closest focus ancestors, and not the key operator, which might prove to be more efficient in accelerate than in recursive implementation 5.3 Future work As mentioned above, the current results do not necessarily signify the unfitness of this approach for Accelerate or Haskell. On the contrary, it opens up a big room for optimizations and improvements to the current code. For example, one might want to measure each accelerate function, and make a conclusion about their efficiency. Another option is to implement most crucial functions, such as selectRowor innerProductnatively in C++ or even in LLVM instruction code. Another possible research direction is to generalize current functions and implement additional ones to provide, ideally, a library of parallel computations in pure Haskell. Overall, if current toolset is both optimised and expanded with new functions, it could become an important step towards library for parallel compiler in pure Haskell.

Chapter 6 Conclusion In the course of my research I have successfully implemented the novel approach to parallel compilation in Haskell. This approach, originally suggested by A. Hsu [1], focuses on eliminating recursion and branching in the compilation process to allow for concurrent execution of various compilation steps on multicore CPU or GPU. In his research, Hsu proposes an array-based approach for an AST and implements it in APL, Array Programming Language. He describes a series of steps which allow group the nodes in the tree by the closest ancestor of a chosen type, e.g. Expr. Selecting different types allows to accomplish several crucial compilation steps, namely expression flattening, function lifting, and constant folding. Unfortunately, Hsu’s findings might not be appreciated enough, since he uses an unpopular and hard to understand language, APL. Therefore, in my implementation I focused not only on replicating his result, but also on providing a purely functional, declarative-style interface for various general and specific operations. For example, I implemented such functions as inner_product,

49 select_rows, and grouping part of 2-dimensional key operator. Therefore, while the implementation is oriented towards parallel compilers, it could be used for other applications. Unfortunately, the implementation turned out to be inefficient compared to recursive approach. But this does not mean the general ineffectiveness of this approach, and should not discourage further research on this topic. There are multiple areas of improvement for future researches. Firstly, one could optimize the existing functions by using Accelerate more efficiently, or even implementing some crucial functions as native ones. Another research goal is to add other general functions to the system, with the end goal of creating a library of with various array operators and functions, found in APL or similar array programming languages, with purely functional, easy-to-use Haskell interface.

Bibliography cited [1] A. W. Hsu, “The key to a data parallel compiler,” ser. ARRAY 2016, Association for Computing Machinery, 2016, pp. 32–40. doi: 10.1145/ 2935323.2935331. [Online]. Available: https://doi.org/10.1145/2935323. 2935331. [2] E. W. Weisstein. (2021). “Tree,” [Online]. Available: https://mathworld. wolfram.com/Tree.html (visited on 04/23/2021). [3] HaskellWiki. (2016). “Applications and libraries/compiler tools,” [Online]. Available: https://wiki.haskell.org/index.php?title=Applications_and_ libraries/Compiler_tools&oldid=60537 (visited on 06/19/2021). [4] J. Grattage. (2008). “An overview of qml with a concrete implementation in haskell.” eprint: 0806.2735, [Online]. Available: https://arxiv.org/abs/ 0806.2735 (visited on 06/19/2021). [5] R. Hui. (Apr. 2018). “Essays/key,” [Online]. Available: https : / / code . jsoftware.com/wiki/Essays/Key (visited on 03/13/2021). [6] E. Iverson. (2001). “J primer,” [Online]. Available: https://jsoftware.com/ help/primer/contents.htm (visited on 03/13/2021).

BIBLIOGRAPHY CITED 51 [7] T. McDonell, M. Chakravarty, V. Grover, and R. Newton, “Type-safe runtime code generation: Accelerate to llvm,” ACM SIGPLAN Notices, vol. 50, Aug. 2015. doi: 10.1145/2887747.2804313. [8] HaskellWiki. (2016). “Numeric haskell: A repa tutorial,” [Online]. Available: https : / / wiki . haskell . org / index . php ? title = Numeric _ Haskell : _A_Repa_Tutorial&oldid=60893 (visited on 03/13/2021). [9] (2021). “The haskell tool stack,” [Online]. Available: https : / / docs . haskellstack.org/en/stable/README/#the-haskell-tool-stack (visited on 03/13/2021). [10] (2010). “Haskell 2010 language report,” [Online]. Available: https://www. haskell.org/onlinereport/haskell2010/ (visited on 03/13/2021). [11] (2020). “Implementations — haskell wiki,” [Online]. Available: https:// wiki.haskell.org/Implementations (visited on 03/13/2021). [12] (2020). “Ghc — haskell wiki,” [Online]. Available: https://wiki.haskell. org/index.php?title=GHC&oldid=63344 (visited on 03/13/2021).

Рецензии:

Авторизуйтесь, чтобы добавить рецензию

- у работы пока нет рецензий -

Отзывы:

Авторизуйтесь, чтобы оставить отзыв

31.12

Роберто Чавез

Great work!!