M. G. Arnold, S. Collange, and D. Defour, Implementing LNS using filtering units of GPUs, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, p.423434, 2010.
DOI : 10.1109/ICASSP.2010.5495516

URL : https://hal.archives-ouvertes.fr/hal-00423434

M. Sylvain-collange, D. Daumas, and . Defour, Graphic processors to speedup simulations for the design of high performance solar receptors, ASAP, pp.377-382, 2007.

]. S. Cdd08a, M. Collange, D. Daumas, and . Defour, Etat de l'intégration de la virgule flottante dans les processeurs graphiques. Revue des sciences et technologies de l'information, pp.719-733, 2008.

M. Sylvain-collange, D. Daumas, and . Defour, Line-by-line spectroscopic simulations on graphics processing units, Computer Physics Communications, vol.178, issue.2, pp.135-143, 2008.
DOI : 10.1016/j.cpc.2007.08.013

M. Sylvain-collange, D. Daumas, and . Defour, Chapter 9 -interval arithmetic in cuda, GPU Computing Gems Jade Edition, pp.99-107, 2012.

M. Sylvain-collange, D. Daumas, R. Defour, and . Olivès, Fonctions élémentaires sur gpu exploitant la localité de valeurs, SYMPosium en Architectures nouvelles de machines (SYMPA), pp.1-11, 2008.

S. Collange, M. Daumas, D. Defour, and D. Parello, Étude comparée et simulation d'algorithmes de branchements pour le gpgpu, SYMPosium en Architectures nouvelles de machines (SYMPA), 2009.

M. [. Collange, D. Daumas, D. Defour, and . Parello, Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010.
DOI : 10.1109/MASCOTS.2010.43

D. Sylvain-collange, S. Defour, R. Graillat, and . Iakymchuk, Full-speed deterministic bit-accurate parallel floating-point summation, 2014.

D. Cdp09b-]-sylvain-collange, D. Defour, and . Parello, Barra, a Modular Functional GPU Simulator for GPGPU, 2009.

D. Sylvain-collange, A. Defour, and . Tisserand, Power Consuption of GPUs from a Software Perspective, Lecture Notes in Computer Science, vol.5544, pp.922-931, 2009.

D. Sylvain-collange, Y. Defour, and . Zhang, Dynamic detection of uniform and affine vectors in gpgpu computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip (HPPC), pp.396719-396720, 2009.

J. Sylvain-collange, D. Flóres, and . Defour, A gpu interval library based on boost interval, Real Numbers and Computers, pp.61-72, 2008.

G. Da, G. Ca, and D. Defour, Implementation of float-float operators on graphics hardware, RNC7, pp.23-32, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00021443

D. Defour and . Florent-de-dinechin, Software Carry-Save: A Case Study for Instruction-Level Parallelism, PaCT, pp.207-214, 2003.
DOI : 10.1007/978-3-540-45145-7_18

D. Defour, Collapsing dependent floating point operations, IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, pp.1-10, 2005.

]. D. Def14 and . Defour, Prédictibilité des ordonnanceurs des gpu, SYMPosium en Architectures nouvelles de machines (SYMPA), pp.1-10, 2014.

D. Defour and B. Goossens, Implémentation de l'opérateur add2, Research Report, vol.3, 2004.

M. [. Defour and . Marin, Real-time simulation of power networks using multi-core architecture, DERBI 2012. DERBI, 2012.

D. Defour and M. Marin, FuzzyGPU: A Fuzzy Arithmetic Library for GPU, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2013.
DOI : 10.1109/PDP.2014.16

URL : https://hal.archives-ouvertes.fr/hal-00856617

D. Defour and M. Marin, Regularity Versus Load-balancing on GPU for Treefix Computations, 2013 International Conference on Computational Science, pp.309-318, 2013.
DOI : 10.1016/j.procs.2013.05.194

URL : https://hal.archives-ouvertes.fr/hal-00768293

D. Defour and M. Marin, FuzzyGPU: A Fuzzy Arithmetic Library for GPU, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, p.2014, 2014.
DOI : 10.1109/PDP.2014.16

URL : https://hal.archives-ouvertes.fr/hal-00856617

]. D. Dp13a, E. Defour, and . Petit, Températures, erreurs matérielles et gpu, SYMPosium en Architectures nouvelles de machines (SYMPA), pp.1-10, 2013.

D. Defour and E. Petit, GPUburn: A system to test and mitigate GPU hardware failures, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp.263-270, 2013.
DOI : 10.1109/SAMOS.2013.6621133

URL : https://hal.archives-ouvertes.fr/hal-00827588

[. Goossens and D. Defour, The instruction register file micro-architecture, GD05b] Bernard Goossens and David Defour. Ordonnancement dynamique distribué. In SympA, pp.767-773, 2005.
DOI : 10.1016/j.future.2004.05.017

URL : https://hal.archives-ouvertes.fr/lirmm-01206362

[. Goossens and D. Defour, The instruction register file micro-architecture, GD06b] Bernard Goossens and David Defour. Ordonnancement distribué d'instructions. Technique et Science Informatiques, pp.767-773, 2006.
DOI : 10.1016/j.future.2004.05.017

URL : https://hal.archives-ouvertes.fr/lirmm-01206362

M. Marin and D. Defour, Cuda et les formats de représentation des nombres flottants, HPC Magazine, issue.4, pp.52-57, 2013.

S. Pritpal, K. Ahuja, M. Skadron, D. W. Martonosi, and . Clark, Multipath execution : Opportunities and limits, International Conference on Supercomputing, pp.101-108, 1998.

C. Álvarez, J. Corbal, and M. Valero, Fuzzy Memoization for Floating-Point Multimedia Applications, IEEE Transactions on Computers, vol.54, issue.7, pp.922-927, 2005.
DOI : 10.1109/TC.2005.119

D. August, J. Chang, S. Girbal, D. Gracia-perez, G. Mouchard et al., UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development, IEEE Computer Architecture Letters, vol.6, issue.2, pp.45-48, 2007.
DOI : 10.1109/L-CA.2007.12

D. I. August, S. Malik, L. Peh, V. Pai, M. Vachharajani et al., Achieving Structural and Composable Modeling of Complex Systems, International Journal of Parallel Programming, vol.18, issue.6, pp.81-101, 2005.
DOI : 10.1007/s10766-005-3569-3

T. Austin, E. Larson, and D. Ernst, SimpleScalar: an infrastructure for computer system modeling, Computer, vol.35, issue.2, pp.59-67, 2002.
DOI : 10.1109/2.982917

D. H. Bailey, A Fortran 90-based multiprecision system, ACM Transactions on Mathematical Software, vol.21, issue.4, pp.379-387, 1995.
DOI : 10.1145/212066.212075

A. Bakhoda, G. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
DOI : 10.1109/ISPASS.2009.4919648

S. Balakrishnan and G. S. Sohi, Exploiting value locality in physical register files, 22nd Digital Avionics Systems Conference. Proceedings (Cat. No.03CH37449), 2003.
DOI : 10.1109/MICRO.2003.1253201

D. Baz, Contribution à l'algorithmique parallèle, Le concept d'asynchronisme : étude théorique, mise en oeuvre et application, 1998.

E. Benowitz, M. Ercegovac, and F. Fallah, Reducing the latency of division operations with partial caching. Signals, Systems and Computers, Conference Record of the Thirty-Sixth Asilomar Conference on, pp.1598-1602, 2002.

F. Benz, A. Hildebrandt, and S. Hack, A dynamic program analysis to find floating-point accuracy problems, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pp.453-462

L. Nathan, R. G. Binkert, L. R. Dreslinski, K. T. Hsu, A. G. Lim et al., The m5 simulator : Modeling networked systems, IEEE Micro, vol.26, issue.4, pp.52-60, 2006.

S. Bonvicini, P. Leonelli, and G. Spadoni, Risk analysis of hazardous materials transportation: evaluating uncertainty by means of fuzzy logic, Journal of Hazardous Materials, vol.62, issue.1, pp.59-74, 1998.
DOI : 10.1016/S0304-3894(98)00158-7

]. K. Briggs, The doubledouble library, 1998.

N. Brisebarre, J. Muller, S. Kumar, and R. , Accelerating correctly rounded floating-point division when the divisor is known in advance, IEEE Transactions on Computers, vol.53, issue.8, pp.1069-1072, 2004.
DOI : 10.1109/TC.2004.37

H. Brönnimann, G. Melquiond, and S. Pion, The design of the Boost interval arithmetic library, Theoretical Computer Science, vol.351, issue.1, pp.111-118, 2006.
DOI : 10.1016/j.tcs.2005.09.062

D. Brooks, V. Tiwari, and M. Martonosi, Wattch, ACM SIGARCH Computer Architecture News, vol.28, issue.2, pp.83-94, 2000.
DOI : 10.1145/342001.339657

I. Buck, K. Fatahalian, and P. Hanrahan, GPUbench : evaluating gpu performance for numerical and scientifc application, Proceedings of the ACM Workshop on General- Purpose Computing on Graphics Processors, 2004.

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian et al., Brook for GPUs : Stream computing on graphics hardware, Proceedings of SIGGRAPH 2004, pp.777-786, 2004.

J. A. Butts and G. S. Sohi, Use-based register caching with decoupled indexing, Proceedings of the 31st Annual International Symposium on Computer Architecture, pp.302-313, 2004.

B. Calder and D. Grunwald, Reducing branch costs via branch alignment, 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.242-251, 1994.

J. J. Chan, B. Sharma, J. Lv, G. Thomas, R. Thulasiram et al., True Random Number Generator Using GPUs and Histogram Equalization Techniques, 2011 IEEE International Conference on High Performance Computing and Communications, pp.161-170, 2011.
DOI : 10.1109/HPCC.2011.30

C. Chang and Y. Wu, The genetic algorithm based tuning method for symmetric membership functions of fuzzy logic control systems, Industrial Automation and Control : Emerging Technologies International IEEE/IAS Conference on, pp.421-428, 1995.

C. Chen, C. Lin, and S. Huang, A fuzzy approach for supplier evaluation and selection in supply chain management, International Journal of Production Economics, vol.102, issue.2, pp.289-301, 2006.
DOI : 10.1016/j.ijpe.2005.03.009

I. , K. Chen, C. Lee, and T. N. Mudge, Instruction prefetching using branch prediction information, ICCD, pp.593-601, 1997.

M. Chen and D. A. Linkens, Rule-base self-generation and simplification for datadriven fuzzy models, Fuzzy Systems The 10th IEEE International Conference on, pp.424-427, 2001.

D. Citron and D. G. Feitelson, Hardware memoization of mathematical and trigonometric functions, 2000.

M. A. Clark, R. Babich, K. Barros, R. C. Brower, and C. Rebbi, Solving lattice QCD systems of equations using mixed precision solvers on GPUs, Computer Physics Communications, vol.181, issue.9, pp.1517-1528, 2010.
DOI : 10.1016/j.cpc.2010.05.002

W. J. Cody, Algorithm 665: Machar: a subroutine to dynamically determined machine parameters, ACM Transactions on Mathematical Software, vol.14, issue.4, pp.303-311, 1988.
DOI : 10.1145/50063.51907

W. J. Cody, Algorithm 714; CELEFUNT: a portable test package for complex elementary functions, ACM Transactions on Mathematical Software, vol.19, issue.1, pp.1-21, 1993.
DOI : 10.1145/151271.151272

W. J. Cody, Algorithm 715; SPECFUN---a portable FORTRAN package of special function routines and test drivers, ACM Transactions on Mathematical Software, vol.19, issue.1, pp.22-30, 1993.
DOI : 10.1145/151271.151273

W. J. Cody and R. Karpinski, A Proposed Radix- and Word-length-independent Standard for Floating-point Arithmetic, IEEE Micro, vol.4, issue.4, pp.86-100, 1984.
DOI : 10.1109/MM.1984.291224

W. Brett, J. E. Coon, and . Lindholm, System and method for managing divergent threads in a SIMD architecture, 2008.

J. L. Cruz, A. Gonzalez, M. Valero, and N. P. Topham, Multiple-banked register file architectures, Proceedings of the 27th Annual International Symposium on Computer Architecture, pp.316-325, 2000.

I. Ati-technologies, B. Daniel, and . Clifton, Method and system for approximating sine and cosine functions, 2001.

C. Florent-de-dinechin, B. Klein, and . Pasca, Generating high-performance custom floating-point pipelines, 19th International Conference on Field Programmable Logic and Applications, pp.59-64, 2009.

. Steve, B. D. Deng, S. Wang, and . Mu, Taming irregular eda applications on gpus, Proceedings of the 2009 International Conference on Computer-Aided Design, ICCAD '09, pp.539-546, 2009.

G. Diamos, A. Kerr, and M. Kesavan, Translating GPU binaries to tiered SIMD architectures with Ocelot, 2009.

D. Dubois and H. Prade, Operations on fuzzy numbers, International Journal of Systems Science, vol.12, issue.6, pp.613-626, 1978.
DOI : 10.1016/S0019-9958(65)90241-X

D. Ernst and T. Austin, Efficient dynamic scheduling through tag elimination, Proceedings of the 29th Annual International Symposium on Computer Architecture, pp.37-46, 2002.

T. Granlund, GNU multiple precision arithmetic library

A. Eustace and A. Srivastava, Atom : A flexible interface for building high performance program analysis tools, Proceedings of the Winter 1995 USENIX Technical Conference on UNIX and Advanced Computing Systems, pp.303-314, 1995.

R. Fernando and M. J. Kilgard, The Cg Tutorial : The Definitive Guide to Programmable Real-Time Graphics, 2003.

J. Fisher, Trace Scheduling: A Technique for Global Microcode Compaction, IEEE Transactions on Computers, vol.30, issue.7, pp.30478-490, 1981.
DOI : 10.1109/TC.1981.1675827

M. Walter and . Fitch, Toward defining the course of evolution : Minimum change for a specific tree topology, Syst Biol, vol.20, pp.406-416, 1971.

J. Flórez, M. Sbert, M. Sainz, and J. Vehí, Efficient Ray Tracing Using Interval Analysis, Parallel Processing and Applied Mathematics, pp.1351-1360, 2008.
DOI : 10.1007/978-3-540-68111-3_143

W. L. Wilson, I. Fung, G. Sham, T. M. Yuan, and . Aamodt, Dynamic warp formation and scheduling for efficient gpu control flow, MICRO '07 : Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp.407-420, 2007.

W. , M. Gentleman, and S. B. Marovitch, More on algorithms that reveal properties of floating point arithmetic units, Communications of the ACM, vol.17, issue.5, 1974.

D. Goddeke, R. Strzodka, and S. Turek, Accelerating double precision fem simulations with GPUs, Proceedings of ASIM 2005 -18th Symposium on Simulation Technique, 2005.

R. Gonzalez, A. Cristal, M. Pericàs, A. Veidenbaum, and M. Valero, Scalable Distributed Register File, Workshop on Complexity-effective Design held in conjunction with the 31st International Symposium on Computer Architecture, 2004.

F. Goualard, Gaol 3.1. 1 : Not just another interval arithmetic library, Laboratoire d'Informatique de Nantes-Atlantique, 2006.

J. Guilhemsang, Test en ligne pour la détection des fautes intermittentes dans les architectures multiprocesseurs embarquées. These, 2011.

J. Guilhemsang, O. Héron, N. Ventroux, O. Goncalves, and A. Giulieri, Impact of the application activity on intermittent faults in embedded systems, 29th VLSI Test Symposium, pp.191-196
DOI : 10.1109/VTS.2011.5783782

E. Hao, P. Chang, M. Evers, and Y. N. Patt, Increasing the instruction fetch rate via block-structured instruction set architectures, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.449-478, 1998.
DOI : 10.1109/MICRO.1996.566461

S. Imran, V. S. Haque, and . Pande, Hard data on soft errors : A large-scale assessment of real-world error rates in gpgpu, Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pp.691-696, 2010.

P. Harish and P. J. Narayanan, Accelerating Large Graph Algorithms on the GPU Using CUDA, Proceedings of the 14th international conference on High performance computing, HiPC'07, pp.197-208, 2007.
DOI : 10.1007/978-3-540-77220-0_21

K. A. Hawick, A. Leist, and D. P. Playne, Parallel graph component labelling with GPUs and CUDA, Parallel Computing, vol.36, issue.12, pp.655-678, 2010.
DOI : 10.1016/j.parco.2010.07.002

O. Héron, J. Guilhemsang, N. Ventroux, and A. Giulieri, Analysis of online self-testing policies for real-time embedded multiprocessors in dsm technologies, IOLTS [6], pp.49-55

Y. Hida, X. Li, and D. H. Bailey, Algorithms for quad-double precision floating point arithmetic, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001, pp.155-162, 2001.
DOI : 10.1109/ARITH.2001.930115

G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean et al., The microarchitecture of the Pentium 4 processor, In Intel technology journal, p.1, 2001.

M. Hussein, A. Varshney, and L. S. Davis, On implementing graph cuts on cuda, First Workshop on General Purpose Processing on Graphics Processing Units, 2007.

K. Kailas, M. Franklin, and K. Ebcioglu, A Register File Architecture and Compilation Scheme for Clustered ILP Processors, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pp.500-511, 2002.
DOI : 10.1007/3-540-45706-2_68

J. Ujval, W. J. Kapasi, S. Dally, P. R. Rixner, J. D. Mattson et al., Efficient conditional operations for data-parallel architectures, MICRO 33 : Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pp.159-170, 2000.

R. Karpinski, PARANOIA : a floating-point benchmark, Byte, vol.10, issue.2, pp.223-235, 1985.

N. S. Kim and T. Mudge, Reducing register ports using delayed write-back queues and operand pre-fetch, Proceedings of the 17th annual international conference on Supercomputing , ICS '03, pp.172-182, 2003.
DOI : 10.1145/782814.782839

O. Michael, J. K. Lam, G. W. Hollingsworth, and . Stewart, Dynamic floating-point cancellation detection, Parallel Comput, vol.39, issue.3, pp.146-155, 2013.

P. Langlois and N. Louvet, More instruction level parallelism explains the actual efficiency of compensated algorithms, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00165020

C. Lauter, Arrondi correct de fonctions mathématiques : fonctions univariées et bivariées, certification et automatisation, 2008.

A. R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th Annual International Symposium on Computer Architecture, pp.59-70, 2002.

C. Leiserson and B. M. Maggs, Communication-efficient parallel algorithms for distributed random-access machines, Algorithmica, vol.11, issue.2, pp.53-77, 1988.
DOI : 10.1007/BF01762110

E. Lindholm, J. Nickolls, S. F. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

E. Lindholm, M. Y. Siu, S. S. Moy, S. Liu, and J. R. Nickolls, Simulating multiported memories using lower port count memories, 2008.

H. Mikko, C. B. Lipasti, J. P. Wilkerson, and . Shen, Value locality and load value prediction, SIGOPS Oper. Syst. Rev, vol.30, issue.5, pp.138-147, 1996.

B. Lisper, Towards parallel programming models for predictability editor, 12th International Workshop on Worst-Case Execution Time Analysis, Schloss Dagstuhl -Leibniz- Zentrum fuer Informatik, pp.48-58, 2012.

J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, RAIDR, ISCA, pp.1-12, 2012.
DOI : 10.1145/2366231.2337161

R. A. Lorie and H. R. Strong, Method for conditional branch execution in simd vector processors, US Patent, vol.4435, p.758, 1984.

L. Luo, M. Wong, and W. Hwu, An effective gpu implementation of breadthfirst search, Proceedings of the 47th Design Automation Conference, DAC '10, pp.52-55, 2010.

S. Peter, M. Magnusson, J. Christensson, D. Eskilson, G. Forsgren et al., Simics : A full system simulation platform, Computer, vol.35, issue.2, pp.50-58, 2002.

A. Scott, R. E. Mahlke, R. A. Hank, J. C. Bringmann, D. M. Gyllenhaal et al., Characterizing the impact of predicated execution on branch prediction, Proceedings of the 27th Annual International Symposium on Microarchitecture, pp.217-227, 1994.

M. K. Milo, D. J. Martin, B. M. Sorin, M. R. Beckmann, M. Marty et al., Multifacet's general execution-driven multiprocessor simulator (gems) toolset, 2005.

D. Merrill, M. Garland, and A. Grimshaw, Scalable GPU graph traversal, ACM SIGPLAN Notices, vol.47, issue.8, pp.117-128, 2012.
DOI : 10.1145/2370036.2145832

P. Michaud and A. Seznec, Data flow prescheduling for large instruction windows in out-oforder processors, Proceedings of the 7th International Symposium on High-Performance Computer Architecture, pp.27-36, 2001.

V. Miranda and J. Saraiva, Fuzzy modelling of power system optimal load flow, Power Industry Computer Application Conference Conference Proceedings, pp.386-392, 1991.

R. E. Moore, R. Baker-kearfott, and M. J. Cloud, Introduction to interval analysis, Society for Industrial and Applied Mathematics, 2009.
DOI : 10.1137/1.9780898717716

V. Moya, C. Gonzalez, J. Roca, A. Fernandez, and R. Espasa, Shader Performance Analysis on a Modern GPU Architecture, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), pp.355-364, 2005.
DOI : 10.1109/MICRO.2005.30

S. M. Mueller, C. Jacobi, H. Oh, K. D. Tran, S. R. Cottier et al., The vector floatingpoint unit in a synergistic processor element of a cell processor, pp.59-67, 2005.

B. S. Nordquist, J. R. Nickolls, and L. I. Bacayo, Parallel data processing systems and methods using cooperative thread arrays and simd instruction issue, 2009.

F. Stuart, M. J. Oberman, and . Flynn, On division and reciprocal caches, 1995.

F. Stuart, M. Oberman, and . Siu, A high-performance area-efficient multifunction interpolator, Proceedings of the 17th IEEE Symposium on Computer Arithmetic (Cap Cod, USA), pp.272-279, 2005.

S. Palacharla, N. P. Jouppi, and J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th Annual International Symposium on Computer Architecture, pp.206-218, 1997.

I. Park, M. D. Powell, and T. N. Vijaykumar, Reducing register ports for higher speed and lower energy, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings., pp.171-182, 2002.
DOI : 10.1109/MICRO.2002.1176248

M. Parks, Number theoretic test generation for directed rounding, pp.241-248, 1999.

A. Peleg and U. Weiser, Dynamic flow instruction cache memory organized around trace segments independent of virtual address line. US Patent 5, 1992.

D. Gracia-perez, G. Mouchard, and O. Temam, Microlib : A case for the quantitative comparison of micro-architecture mechanisms, MICRO 37 : Proceedings of
URL : https://hal.archives-ouvertes.fr/inria-00001110

J. E. , L. Peter, C. Mills, B. W. Coon, G. M. Tarolli et al., Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching, 2008.

J. Phillips and S. Vassiliadis, High-performance 3-1 interlock collapsing ALU's, IEEE Transactions on Computers, vol.43, issue.3, pp.257-268, 1994.
DOI : 10.1109/12.272427

J. Pierce and T. N. Mudge, Wrong-path instruction prefetching, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.165-175, 1996.
DOI : 10.1109/MICRO.1996.566459

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Pollock-harbison and I. , A computer architecture for the dynamic optimization of high-level language programs, 1980.

D. M. Priest, Algorithms for arbitrary precision floating point arithmetic, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic, pp.132-144, 1991.
DOI : 10.1109/ARITH.1991.145549

M. Ramirez, A. Cristal, A. Veidenbaum, L. Villa, and M. Valero, Direct Instruction Wakeup for Out-of-Order Processors, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'04), pp.2-9, 2004.
DOI : 10.1109/IWIA.2004.10002

G. Reinman, B. Calder, and T. Austin, A scalable front-end architecture for fast instruction delivery, Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA'99) of Computer Architecture News, pp.234-245, 1999.

G. Reinman, B. Calder, and T. M. Austin, Fetch directed instruction prefetching, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp.16-27, 1999.
DOI : 10.1109/MICRO.1999.809439

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. Revol and F. Rouillier, The mpfi library, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00544998

E. Stephen and . Richardson, Exploiting trivial and redundant computation, Proceedings of the 11th IEEE Symposium on Computer Arithmetic, pp.220-227, 1993.

M. Rosenblum, E. Bugnion, S. Devine, and S. A. Herrod, Using the SimOS machine simulator to study complex computer systems, ACM Transactions on Modeling and Computer Simulation, vol.7, issue.1, pp.78-103, 1997.
DOI : 10.1145/244804.244807

E. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: a low latency approach to high bandwidth instruction fetching, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.24-34, 1996.
DOI : 10.1109/MICRO.1996.566447

C. Rubio-gonzález, C. Nguyen, H. Diep-nguyen, J. Demmel, W. Kahan et al., Precimonious, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-2712, 2013.
DOI : 10.1145/2503210.2503296

S. M. Rump, Fast and parallel interval arithmetic, Bit Numerical Mathematics, vol.39, issue.3, pp.534-554, 1999.
DOI : 10.1023/A:1022374804152

A. Y. Saber and G. K. Venayagamoorthy, Resource scheduling under uncertainty in a smart grid with renewables and plug-in vehicles [128] D. Sankoff. Minimal mutation trees of sequences, Systems Journal, IEEE SIAM Journal on Applied Mathematics, vol.6, issue.1, pp.103-109, 1975.

R. Smruti, B. Sarangi, J. Greskamp, and . Torrellas, CADRE : Cycle-accurate deterministic replay for hardware debugging, DSN, pp.301-312, 2006.

T. Sato, Y. Nakamura, and I. Arita, Revisiting direct tag search algorithm on superscalar processors, Workshop on Complexity-effective Design held in conjunction with the 28th International Symposium on Computer Architecture, 2001.

Y. Sazeides, S. Vassiliadis, and J. E. Smith, The performance potential of data dependence speculation & colapsing, Proceedings of the 29th annual IEEE/ACM international symposium on Microarchitecture, pp.238-247, 1996.

B. Schroeder, E. Pinheiro, and W. Weber, DRAM errors in the wild, Communications of the ACM, vol.54, issue.2, pp.100-107, 2011.
DOI : 10.1145/1897816.1897844

N. L. Schryer, A test of computer's floating-point arithmetic unit, 1981.

S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, Scan primitives for gpu computing, Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, GH '07, pp.97-106, 2007.

E. Senn, J. Laurent, N. Julien, and E. Martin, SoftExplorer: Estimating and Optimizing the Power and Energy Consumption of a C Program for DSP Applications, EURASIP Journal on Advances in Signal Processing, vol.2005, issue.16, pp.2641-2654, 2005.
DOI : 10.1155/ASP.2005.2641

URL : https://hal.archives-ouvertes.fr/hal-00077302

A. Seznec, E. Toullec, and O. Rochecouste, Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings., pp.383-394, 2002.
DOI : 10.1109/MICRO.2002.1176265

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. W. Sheaffer, D. Luebke, and K. Skadron, A flexible simulation framework for graphics architectures, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware , HWWS '04, pp.85-94, 2004.
DOI : 10.1145/1058129.1058142

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. R. Shewchuk, Adaptive precision floating-point arithmetic and fast robust geometric predicates, Discrete and Computational Geometry, pp.305-363, 1997.

D. Shirmohammadi, H. W. Hong, A. Semlyen, and G. X. Luo, A compensation-based power flow method for weakly meshed distribution and transmission networks, IEEE Transactions on Power Systems, vol.3, issue.2, pp.753-762, 1988.
DOI : 10.1109/59.192932

D. Stevenson, A Proposed Standard for Binary Floating-Point Arithmetic, Computer, vol.14, issue.3, pp.51-62, 1981.
DOI : 10.1109/C-M.1981.220377

D. Stevenson, An American national standard : IEEE standard for binary floating point arithmetic, ACM SIGPLAN Notices, vol.22, issue.2, pp.9-25, 1987.

R. Strzodka, Virtual 16 bit precise operations on rgba8 textures, Proceedings of Vision, Modeling, and Visualization, pp.171-178, 2002.

J. A. Swenson and Y. N. Patt, Hierarchical registers for scientific computers, Proceedings of the 2nd international conference on Supercomputing , ICS '88, pp.346-353, 1988.
DOI : 10.1145/55364.55398

C. Turinici, C. Rochange, and P. Sainrat, Prédicteurs mixtes pour l'anticipation des instructions, 5ème Symposium sur les Architectures Nouvelles de Machines (SYMPA'5), pp.165-174, 1999.

A. David, P. Vallado, R. Crawford, T. S. Hujsak, and . Kelso, Revisiting spacetrack report #3, Proceedings of the AIAA/AAS Astrodynamics Specialist Conference, 2006.

S. Vassiliadis, J. Phillips, and B. Blanner, Interlock collapsing ALU's, IEEE Transactions on Computers, vol.42, issue.7, pp.825-839, 1993.
DOI : 10.1109/12.237723

A. V. Veidenbaum, Q. Zhao, and A. Shameer, NON-SEQUENTIAL INSTRUCTION CACHE PREFETCHING FOR MULTIPLE???ISSUE PROCESSORS, International Journal of High Speed Computing, vol.10, issue.01, pp.115-140, 1999.
DOI : 10.1142/S0129053399000065

B. Verdonk, A. Cuyt, and D. Verschaeren, A precision- and range-independent tool for testing floating-point arithmetric I: basic operations, square root, and remainder, ACM Transactions on Mathematical Software, vol.27, issue.1, pp.92-118, 2001.
DOI : 10.1145/382043.382404

A. Verma, A. K. Verma, H. Parandeh-afshar, P. Brisk, and P. Ienne, Synthesis of Floating-Point Addition Clusters on FPGAs Using Carry-Save Arithmetic, 2010 International Conference on Field Programmable Logic and Applications, pp.19-24, 2010.
DOI : 10.1109/FPL.2010.15

V. Volkov, Better performance at lower occupancy, Proceedings of the GPU Technology Conference, 2010.

Z. Wei and J. Jaja, OPTIMIZATION OF LINKED LIST PREFIX COMPUTATIONS ON MULTITHREADED GPUS USING CUDA, Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp.1-8, 2010.
DOI : 10.1142/S0129626412500120

S. Weiss and J. E. Smith, Instruction issue logic for pipelined supercomputers, Proceedings of the 11th International Symposium on Computer Architecture, pp.110-118, 1984.

C. M. Wittenbrink, E. Kilgariff, A. Prabhu, G. Fermi, and . Gpu, Fermi GF100 GPU Architecture, IEEE Micro, vol.31, issue.2, pp.50-59, 2011.
DOI : 10.1109/MM.2011.24

H. Wong, M. Papadopoulou, M. Sadooghi-alvandi, and A. Moshovos, Demystifying GPU microarchitecture through microbenchmarking, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
DOI : 10.1109/ISPASS.2010.5452013

W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, The design and use of simplepower, Proceedings of the 37th conference on Design automation , DAC '00, pp.340-345, 2000.
DOI : 10.1145/337292.337436

Y. Yoshida, M. Yasuda-ichi-nakagami, and M. Kurano, A new evaluation of mean value for fuzzy numbers and its application to american put option under uncertainty. Fuzzy Sets and Systems, pp.2614-2626, 2006.

R. Yung and N. Wilhelm, Caching processor general registers, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors, pp.307-312, 1995.
DOI : 10.1109/ICCD.1995.528826

L. A. Zadeh, The role of fuzzy logic in the management of uncertainty in expert systems, Fuzzy Sets and Systems, vol.11, issue.1-3, pp.197-198, 1983.
DOI : 10.1016/S0165-0114(83)80081-5

J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero, Hierarchical clustered register file organization for VLIW processors, Proceedings International Parallel and Distributed Processing Symposium, p.77, 2003.
DOI : 10.1109/IPDPS.2003.1213178

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=