https://hal-lirmm.ccsd.cnrs.fr/lirmm-00813055Mouilleron, ChristopheChristopheMouilleronDALI - Digits, Architectures et Logiciels Informatiques - LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier - UM - Université de Montpellier - CNRS - Centre National de la Recherche Scientifique - UPVD - Université de Perpignan Via DomitiaNajahi, Mohamed AmineMohamed AmineNajahiDALI - Digits, Architectures et Logiciels Informatiques - LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier - UM - Université de Montpellier - CNRS - Centre National de la Recherche Scientifique - UPVD - Université de Perpignan Via DomitiaRevy, GuillaumeGuillaumeRevyDALI - Digits, Architectures et Logiciels Informatiques - LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier - UM - Université de Montpellier - CNRS - Centre National de la Recherche Scientifique - UPVD - Université de Perpignan Via DomitiaApproach based on instruction selection for fast and certified code generationHAL CCSD2012fixed-point arithmeticautomatic code generationinstruction selectionnumerical certification[INFO.INFO-AO] Computer Science [cs]/Computer ArithmeticRevy, Guillaume2013-04-14 22:45:122022-10-13 08:22:242013-04-16 20:06:13enConference papersapplication/pdf1Nowadays floating-point arithmetic has become ubiquitous in the specification and implementation of programs, including those targeted at embedded systems. However, for the sake of chip area or power consumption constraints, some of these embedded systems are still shipped with no floating-point unit. In this case, only integer arithmetic is available at the hardware level. Hence, in order to run floating-point programs, we need either to use a library that emulates floating-point arithmetic in software (like the FLIP library), or to rewrite the programs so as to rely on fixed-point arithmetic instead. Yet both approaches require the design of fixed-point routines, which appears to be a tedious and error prone task especially since it is still partly done by hand. Thus, one of the current challenges is to design automatic tools to generate fixed-point programs as fast as possible while satisfying some accuracy constraints. In this sense, we have developed the CGPE software tool, dedicated to the generation of fast and certified codes for evaluating bivariate polynomials in fixed-point arithmetic. This tool, based on the generation of several fast evaluation codes combined with a systematic numerical verification step, is well suited for VLIW integer processors using only binary adders and multipliers. We propose here an extension of CGPE, which consists in adding a step based on instruction selection in order to improve the speed and the accuracy of the generated codes for more advanced architectures. Given an instruction set architecture, instruction selection is the compilation process that aims at finding a sequence of instructions implementing "at best" a given program. It works on a target-independent intermediate representationof this program, represented as a tree or a directly acyclic graph (DAG), and is usually used to optimize the code size or latency on the target architecture, while no guarantee is provided concerning the accuracy of the generated code. The general problem of instruction selection has been well studied and, although it has been proven to be NP-complete even for simple machines in the case of DAGs, several algorithms exist to tackle this problem. In the context of CGPE, where we represent polynomial evaluation expressions with DAGs, we can benefit from this work on instruction selection by combining it with the numerical verification step already implemented. The advantage of our new approach is twofold. First it is much more flexible than writing a generation algorithm for each available processor. Indeed, it mainly needs to work on the DAG representation of the expression to be implemented, which is independent of the target architecture, and thus it makes easier to handle various architectures shipping different kind of instructions. Second it allows us to generate automatically codes optimized for a given target and satisfying various criteria like accuracy and performance, as well as code size. So far, this approach has been tested on the evaluation of polynomials, where it allows us to write efficient codes using at best some advanced architecture features like a fused-multiply-add operator.