Accelerating HQC Key Encapsulation Mechanism with AVX-512

Roberto Cabral; Armando Faz-Hernandez; Julio López; Armando Faz-Hernandez; Julio López; Armando Faz-Hernandez; Armando Faz-Hernandez; Julio López; Armando Faz-Hernandez; Julio López; Armando Faz-Hernández; Sam Scott; Nick Sullivan; Riad S. Wahby; Christopher A. Wood; Alex Davidson; Armando Faz-Hernández; Nick Sullivan; Christopher A. Wood; Watson Ladd; Tanya Verma; Marloes Venema; Armando Faz-Hernández; Brendan McMillion; Avani Wildani; Nick Sullivan; Armando Faz-Hernández; Julio López; Guru-Vamsi Policharla; Bas Westerbaan; Armando Faz-Hernández; Christopher A. Wood; Armando Faz-Hernandez; Tara Whalen; Thibault Meunier; Mrudula Kodali; Alex Davidson; Marwan Fayed; Armando Faz-Hernández; Watson Ladd; Deepak Maram; Nick Sullivan; Benedikt Wolters; Maxime Guerreiro; Andrew Galloni; Sofía Celi; Armando Faz-Hernández; Nick Sullivan; Goutam Tamvada; Luke Valenta; Thom Wiggers; Bas Westerbaan; Christopher A. Wood; Armando Faz-Hernández; Watson Ladd; Deepak Maram; Armando Faz-Hernández; Julio López; Armando Faz-Hernández; Julio López; Ricardo Dahab; Armando Faz-Hernández; Julio López; Eduardo Ochoa-Jiménez; Francisco Rodríguez-Henríquez; Armando Faz-Hernández; Julio López; Ana Karina D. S. de Oliveira; Thomaz Oliveira; Julio López; Hüseyin Hisil; Armando Faz-Hernández; Francisco Rodríguez-Henríquez; Armando Faz-Hernández; Hayato Fujii; Diego F. Aranha; Julio López; Armando Faz-Hernández; Julio López; Armando Faz-Hernández; Patrick Longa; Ana Helena Sánchez; Armando Faz-Hernández; Roberto Cabral; Diego F. Aranha; Julio López; Armando Faz-Hernández; Julio López; Armando Faz-Hernández; Patrick Longa; Ana Helena Sánchez; Armando Faz-Hernández; Julio López; Armando Faz-Hernández; Armando Faz-Hernández; Diego F. Aranha; Armando Faz-Hernández; Julio López; Francisco Rodríguez-Henríquez; Jonathan Taverne; Armando Faz-Hernández; Diego F. Aranha; Francisco Rodríguez-Henríquez; Darrel Hankerson; Julio López; Jonathan Taverne; Armando Faz-Hernández; Diego F. Aranha; Francisco Rodriguez-Henríquez; Darrel Hankerson; Julio López

doi:10.1145/3803627.3805815

Showing 31 of 31 publications

2026

Conference
Accelerating HQC Key Encapsulation Mechanism with AVX-512
Roberto Cabral, Armando Faz-Hernandez, Julio López
Proceedings of the ACM ASIA Public-Key Cryptography Workshop, 2026
DOI Code Details
Best Paper Award
Post-Quantum Cryptography (PQC) aims to secure digital communications against adversaries powered by quantum computers. To be useful, PQC algorithms must be both secure and fast. The Hamming Quasi-Cyclic (HQC) key encapsulation mechanism is a primary code-based alternative to lattice-based standards such as ML-KEM. However, initial benchmarks on Intel processors using AVX2 show that HQC is 9.5x slower than ML-KEM. In this paper, we show that careful analysis and optimization of the implementation can substantially narrow this gap. Using AVX-512 instructions in conjunction with architecture-independent refinements, we significantly accelerated internal HQC operations, including the Reed-Muller decoding step, polynomial multiplications, and the SHA3 hash function. These improvements narrow the performance gap between HQC and ML-KEM to 5.6x. Our contribution brings high-performance implementations of alternative schemes like HQC that are essential for long-term cryptographic agility.
```
@inproceedings{cabral_apkc_asiaccs2026,
  author = {Roberto Cabral and Armando Faz-Hernandez and Julio López},
  title = {Accelerating HQC Key Encapsulation Mechanism with AVX-512},
  booktitle = {Proceedings of the ACM ASIA Public-Key Cryptography Workshop},
  publisher = {Association for Computing Machinery},
  location = {Bengaluru, India},
  address = {New York, NY, USA},
  series = {APKC '26},
  pages = {1–10},
  year = {2026},
  month = {may},
  isbn = {9798400725777},
  doi = {10.1145/3803627.3805815}
}
```
Journal
High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Distillation)
Armando Faz-Hernandez, Julio López
Journal of the Brazilian Computer Society, 2026
DOI Details
Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm's design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication P+kQ, and a faster formula for calculating 3P on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.
```
@article{faz_jbcs2026,
  author = {Armando Faz-Hernandez and Julio López},
  title = {High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Distillation)},
  journal = {Journal of the Brazilian Computer Society},
  volume = {32},
  number = {1},
  pages = {516--526},
  year = {2026},
  month = {mar},
  issn = {1678-4804},
  doi = {10.5753/jbcs.2026.5548}
}
```

2025

Conference
Rhizomes and the Roots of Efficiency — Improving Prio
Armando Faz-Hernandez
Progress in Cryptology - LATINCRYPT 2025 - 9th International Conference on Cryptology and Information Security in Latin America, 2025
DOI e-print Code Details
Prio, tailored under privacy-by-design principles, is a protocol for aggregating client-provided measurements between non-colluding entities. The validity of measurements is determined by using a fully linear probabilistically-checkable proof (FLPCP). The Prover distributes secret shares of the measurement and the proof to multiple Verifiers. These Verifiers can only use linear queries on the input statement for validation without accessing the actual measurement. Efficiency is key for the practical application of Prio. The FLPCP operates with polynomials represented in the Lagrange basis using roots of unity as the nodes. However, we observe opportunities to improve its performance by embracing the Lagrange basis more extensively. For instance, we show an inversion-free O(n) time-complexity algorithm for polynomial evaluation in the Lagrange basis (an alternative to the classic rational barycentric formula). By applying our methods to libprio-rs, a cutting-edge Rust implementation, the Sharding phase (proof generation) runs a 36 percent faster and the Prep-Init phase (proof verification) is twice as fast, showing a substantial acceleration of the most time-consuming phases of Prio.
```
@inproceedings{faz_latincrypt2025,
  author = {Armando Faz-Hernandez},
  title = {Rhizomes and the Roots of Efficiency — Improving Prio},
  booktitle = {Progress in Cryptology - LATINCRYPT 2025 - 9th International Conference on Cryptology and Information Security in Latin America},
  editor = {Daniel Escudero and Ivan Damgård},
  publisher = {Springer Nature Switzerland},
  location = {Medellín, Colombia},
  series = {Lecture Notes in Computer Science},
  volume = {16129},
  pages = {425--449},
  year = {2025},
  month = {oct},
  isbn = {978-3-032-06754-8},
  doi = {10.1007/978-3-032-06754-8_16}
}
```

2024

Journal
High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves
Armando Faz-Hernandez, Julio López
CLEI Electronic Journal, 2024
DOI Details
Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm’s design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication P + kQ, and a faster formula for calculating 3P on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.
```
@article{faz_cleiej2024,
  author = {Armando Faz-Hernandez and Julio López},
  title = {High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves},
  journal = {CLEI Electronic Journal},
  volume = {27},
  number = {3},
  year = {2024},
  month = {aug},
  pages = {1--10},
  issn = {0717-5000},
  doi = {10.19153/cleiej.27.3.3}
}
```
Abstract
High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Summary)
Armando Faz-Hernandez, Julio López
Anais Estendidos do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, 2024
DOI Details
Honorable Mention: Best PhD Thesis
Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm's design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication P+kQ, and a faster formula for calculating 3P on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.
```
@inproceedings{faz_ctd_sbseg2024,
  author = {Armando Faz-Hernandez and Julio López},
  title = {High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Summary)},
  booktitle = {Anais Estendidos do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais},
  editor = {Lourenço Alves Pereira Júnior and Diego Kreutz},
  location = {São José dos Campos, SP. Brazil},
  publisher = {Sociedade Brasileira de Computação},
  address = {Porto Alegre, RS, Brazil},
  year = {2024},
  month = {sep},
  pages = {49--56},
  doi = {10.5753/sbseg_estendido.2024.241959}
}
```

2023

RFC

RFC 9380: Hashing to Elliptic Curves

Armando Faz-Hernández, Sam Scott, Nick Sullivan, Riad S. Wahby, Christopher A. Wood

RFC Editor, 2023

DOIDetails

@article{rfc9380,
  author = {Armando Faz-Hernández and
            Sam Scott and
            Nick Sullivan and
            Riad S. Wahby and
            Christopher A. Wood},
  title = {RFC 9380: Hashing to Elliptic Curves},
  series = {Request for Comments},
  publisher = {RFC Editor},
  number = {9380},
  pages = {1--145},
  year = {2023},
  month = {aug},
  doi = {10.17487/RFC9380}
}

RFC
RFC 9497: Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order Groups
Alex Davidson, Armando Faz-Hernández, Nick Sullivan, Christopher A. Wood
RFC Editor, 2023
DOI Details
An Oblivious Pseudorandom Function (OPRF) is a two-party protocol between a client and a server for computing the output of a Pseudorandom Function (PRF). The server provides the PRF private key, and the client provides the PRF input. At the end of the protocol, the client learns the PRF output without learning anything about the PRF private key, and the server learns neither the PRF input nor output. An OPRF can also satisfy a notion of 'verifiability', called a VOPRF. A VOPRF ensures clients can verify that the server used a specific private key during the execution of the protocol. A VOPRF can also be partially oblivious, called a POPRF. A POPRF allows clients and servers to provide public input to the PRF computation. This document specifies an OPRF, VOPRF, and POPRF instantiated within standard prime-order groups, including elliptic curves. This document is a product of the Crypto Forum Research Group (CFRG) in the IRTF.
```
@article{rfc9497,
  author = {Alex Davidson and
            Armando Faz-Hernández and
            Nick Sullivan and
            Christopher A. Wood},
  title = {RFC 9497: Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order Groups},
  series = {Request for Comments},
  publisher = {RFC Editor},
  number = {9497},
  pages = {1--61},
  year = {2023},
  month = {dec},
  doi = {10.17487/RFC9497}
}
```
Conference
Portunus: Re-imagining Access Control in Distributed Systems
Watson Ladd, Tanya Verma, Marloes Venema, Armando Faz-Hernández, Brendan McMillion, Avani Wildani, Nick Sullivan
2023 USENIX Annual Technical Conference (USENIX ATC 23), 2023
HTML PDF Video Details
TLS termination, which is essential to network and security infrastructure providers, is an extremely latency-sensitive operation that benefits from access to sensitive key material close to the edge. However, increasing regulatory concerns prompt customers to demand sophisticated controls on where their keys may be accessed. While traditional access-control solutions rely on a highly-available centralized process to enforce access, the round-trip latency and decreased fault tolerance make this approach unappealing. Furthermore, the desired level of customer control is at odds with the homogeneity of the distribution process for each key. To solve this dilemma, we have designed and implemented Portunus, a cryptographic storage and access control system built using a variant of public-key cryptography called attribute-based encryption (ABE). Using Portunus, TLS keys are protected using ABE under a policy chosen by the customer. Each server is issued unique ABE keys based on its attributes, allowing it to decrypt only the TLS keys for which it satisfies the policy. Thus, the encrypted keys can be stored at the edge, with access control enforced passively through ABE. If a server receives a TLS connection but is not authorized to decrypt the necessary TLS key, the request is forwarded directly to the nearest authorized server, further avoiding the need for a centralized coordinator. In comparison, a trivial instantiation of this system using standard public-key cryptography might wrap each TLS key with the key of every authorized data center. This strategy, however, multiplies the storage overhead by the number of data centers. Deployed across Cloudflare's 400+ global data centers, Portunus handles millions of requests per second globally, making it one of the largest deployments of ABE.
```
@inproceedings{ladd_atc2023,
  author = {Watson Ladd and
            Tanya Verma and
            Marloes Venema and
            Armando Faz-Hernández and
            Brendan McMillion and
            Avani Wildani and
            Nick Sullivan},
  title = {Portunus: Re-imagining Access Control in Distributed Systems},
  booktitle = {2023 USENIX Annual Technical Conference (USENIX ATC 23)},
  publisher = {USENIX Association},
  address = {Boston, MA},
  isbn = {978-1-939133-35-9},
  pages = {35--52},
  year = {2023},
  month = {jul},
  note = {https://www.usenix.org/conference/atc23/presentation/ladd}
}
```
Abstract
High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Summary)
Armando Faz-Hernández, Julio López
Anais do XXXVI Concurso de Teses e Dissertações, 2023
DOI Details
Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm's design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication P+kQ, and a faster formula for calculating 3P on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.
```
@inproceedings{faz_ctd_csbc2023,
  author = {Armando Faz-Hernández and Julio López},
  title = {High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Summary)},
  booktitle = {Anais do XXXVI Concurso de Teses e Dissertações},
  publisher = {Sociedade Brasileira da Computação},
  series = {Anais do XLIII Congresso da Sociedade Brasileira de Computação},
  address = {Porto Alegre, RS, Brasil},
  location = {João Pessoa, PB. Brazil},
  doi = {10.5753/ctd.2023.230156},
  issn = {2763-8820},
  year = {2023},
  month = {aug},
  pages = {20--29}
}
```
Unpublished
Post-Quantum Privacy Pass via Post-Quantum Anonymous Credentials
Guru-Vamsi Policharla, Bas Westerbaan, Armando Faz-Hernández, Christopher A. Wood
IACR Cryptology ePrint Archive, 2023
HTML PDF Details
It is known that one can generically construct a post-quantum anonymous credential scheme, supporting the showing of arbitrary predicates on its attributes using general-purpose zero-knowledge proofs secure against quantum adversaries [Fischlin, CRYPTO 2006]. Traditionally, such a generic instantiation is thought to come with impractical sizes and performance. We show that with careful choices and optimizations, such a scheme can perform surprisingly well. In fact, it performs competitively against state-of-the-art post-quantum blind signatures, for the simpler problem of post-quantum unlinkable tokens, required for a post-quantum version of Privacy Pass. To wit, a post-quantum Privacy Pass constructed in this way using zkDilithium, our proposal for a STARK friendly variation on Dilithium2, allows for a trade-off between token size (85-175 KB) and generation time (0.3-5 s) with a proof security level of 115 bits. Verification of these tokens can be done in 20-30 ms. We argue that these tokens are reasonably practical, adding less than a second upload time over traditional tokens, supported by a measurement study. Finally, we point out a clear advantage of our approach: the flexibility afforded by the general purpose zero-knowledge proofs. We demonstrate this by showing how we can construct a rate-limited variant of Privacy Pass that doesn’t not rely on non-collusion for privacy.
```
@article{policharla_iacr2023,
  author = {Guru-Vamsi Policharla and
            Bas Westerbaan and
            Armando Faz-Hernández and
            Christopher A. Wood},
  title = {Post-Quantum Privacy Pass via Post-Quantum Anonymous Credentials},
  journal = {IACR Cryptology ePrint Archive},
  volume = {2023},
  number = {414},
  year = {2023},
  month = {mar},
  note = {https://eprint.iacr.org/2023/414}
}
```

2022

Thesis
High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves
Armando Faz-Hernandez
2022 · PhD Thesis
HTML Details
Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm's design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication P + kQ, and a faster formula for calculating 3P on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.
```
@phdthesis{faz_phd_thesis,
  author = {Armando Faz-Hernandez},
  title = {High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves},
  school = {University of Campinas},
  address = {Campinas, SP. Brazil},
  type = {PhD thesis},
  year = {2022},
  month = {sep},
  note = {https://hdl.handle.net/20.500.12733/6756}
}
```
Conference
Let The Right One In: Attestation as a Usable CAPTCHA Alternative
Tara Whalen, Thibault Meunier, Mrudula Kodali, Alex Davidson, Marwan Fayed, Armando Faz-Hernández, Watson Ladd, Deepak Maram, Nick Sullivan, Benedikt Wolters, Maxime Guerreiro, Andrew Galloni
Eighteenth Symposium on Usable Privacy and Security (SOUPS 2022), 2022
HTML PDF Slides Video Details
CAPTCHAs are necessary to protect websites from bots and malicious crawlers, yet are increasingly solvable by automated systems. This has led to more challenging tests that require greater human effort and cultural knowledge; they may prevent bots effectively but sacrifice usability and discourage the human users they are meant to admit. We propose a new class of challenge: a Cryptographic Attestation of Personhood (CAP) as the foundation of a usable, pro-privacy alternative. Our challenge is constructed using the open Web Authentication API (WebAuthn) that is supported in most browsers. We evaluated the CAP challenge through a public demo, with an accompanying user survey. Our evaluation indicates that CAP has a strong likelihood of adoption by users who possess the necessary hardware, showing good results for effectiveness and efficiency as well as a strong expressed preference for using CAP over traditional CAPTCHA solutions. In addition to demonstrating a mechanism for more usable challenge tests, we identify some areas for improvement for the WebAuthn user experience, and reflect on the difficult usable privacy problems in this domain and how they might be mitigated.
```
@inproceedings{whalen_soups2022,
  author = {Tara Whalen and
            Thibault Meunier and
            Mrudula Kodali and
            Alex Davidson and
            Marwan Fayed and
            Armando Faz-Hernández and
            Watson Ladd and
            Deepak Maram and
            Nick Sullivan and
            Benedikt Wolters and
            Maxime Guerreiro and
            Andrew Galloni},
  title = {Let The Right One In: Attestation as a Usable CAPTCHA Alternative},
  booktitle = {Eighteenth Symposium on Usable Privacy and Security (SOUPS 2022)},
  publisher = {USENIX Association},
  pages = {599--612},
  isbn = {978-1-939133-30-4},
  address = {Boston, MA},
  year = {2022},
  month = {aug}
}
```

2021

Conference
Implementing and Measuring KEMTLS
Sofía Celi, Armando Faz-Hernández, Nick Sullivan, Goutam Tamvada, Luke Valenta, Thom Wiggers, Bas Westerbaan, Christopher A. Wood
Progress in Cryptology - LATINCRYPT 2021 - 7th International Conference on Cryptology and Information Security in Latin America, Bogotá, Colombia, October 6-8, 2021, Proceedings, 2021
DOI e-print Details
KEMTLS is a novel alternative to the Transport Layer Security (TLS) handshake that integrates post-quantum algorithms. It uses key encapsulation mechanisms (KEMs) for both confidentiality and authentication, achieving post-quantum security while obviating the need for expensive post-quantum signatures. The original KEMTLS paper presents a security analysis, Rust implementation, and benchmarks over emulated networks. In this work, we provide full Go implementations of KEMTLS and other post-quantum handshake alternatives, describe their integration into a distributed system, and provide performance evaluations over real network conditions. We compare the standard (non-quantum-resistant) TLS 1.3 handshake with three alternatives: one that uses post-quantum signatures in combination with post-quantum KEMs (PQTLS), one that uses KEMTLS, and one that is a reduced round trip version of KEMTLS (KEMTLS-PDK). In addition to the performance evaluations, we discuss how the design of these protocols impacts TLS from an implementation and configuration perspective.
```
@inproceedings{celi_latincrypt2021,
  author = {Sofía Celi and
            Armando Faz-Hernández and
            Nick Sullivan and
            Goutam Tamvada and
            Luke Valenta and
            Thom Wiggers and
            Bas Westerbaan and
            Christopher A. Wood},
  title = {Implementing and Measuring KEMTLS},
  booktitle = {Progress in Cryptology - LATINCRYPT 2021 - 7th International Conference on Cryptology and Information Security in Latin America, Bogotá, Colombia, October 6-8, 2021, Proceedings},
  editor = {Longa, Patrick and Ràfols, Carla},
  publisher = {Springer International Publishing},
  address = {Bogotá, Colombia},
  isbn = {978-3-030-88238-9},
  pages = {88--107},
  year = {2021},
  month = {oct},
  doi = {10.1007/978-3-030-88238-9_5}
}
```
Conference
ZKAttest: Ring and Group Signatures for Existing ECDSA Keys.
Armando Faz-Hernández, Watson Ladd, Deepak Maram
Selected Areas in Cryptography - 28th International Conference, SAC 2021. Virtual Event, September 29 - October 1, 2021, 2021
DOI e-print Details
Cryptographic keys are increasingly stored in dedicated hardware or behind software interfaces. Doing so limits access, such as permitting only signing via ECDSA. This makes using them in existing ring and group signature schemes impossible as these schemes assume the ability to access the private key for other operations. We present a sigma-protocol that uses a committed public key to verify an ECDSA or Schnorr signature on a message, without revealing the public key. We then discuss how this protocol may be used to derive ring signatures in combination with Groth-Kohlweiss membership proofs and other applications. This scheme has been implemented and source code is freely available.
```
@inproceedings{faz_sac2021,
  author = {Armando Faz-Hernández and
            Watson Ladd and
            Deepak Maram},
  title = {ZKAttest: Ring and Group Signatures for Existing ECDSA Keys.},
  booktitle = {Selected Areas in Cryptography - 28th International Conference, SAC 2021. Virtual Event, September 29 - October 1, 2021},
  pages = {68--83},
  editor = {AlTawy, Riham and Hülsing, Andreas},
  publisher = {Springer International Publishing},
  address = {Canada},
  isbn = {978-3-030-99277-4},
  year = {2021},
  month = {oct},
  doi = {10.1007/978-3-030-99277-4_4}
}
```

2020

Conference
Generation of Elliptic Curve Points in Tandem
Armando Faz-Hernández, Julio López
Anais do XX Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, 2020
DOI Details
A hash to curve function H, mapping bit strings to points on an elliptic curve, is often required in cryptographic schemes based on elliptic curves. Its construction is based on a deterministic encoding and a cryptographic hash function, which complementarily dominate its execution time. To improve the performance of H, we propose a parallel strategy where two units execute in tandem the internal operations of H. We instantiate this approach with a parallel software implementation of a hash to curve function that outputs points on a twisted Edwards curve. A performance benchmark on Haswell and Skylake micro-architectures shows that our parallel implementation is 1.4 times faster than its sequential implementation.
```
@inproceedings{faz_sbseg2020,
  author = {Armando Faz-Hernández and Julio López},
  title = {Generation of Elliptic Curve Points in Tandem},
  booktitle = {Anais do XX Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais},
  location = {Petrópolis, RJ, Brazil},
  editor = {Igor Monteiro Moraes and Luis Kowada},
  address = {Porto Alegre, RS, Brasil},
  year = {2020},
  month = {oct},
  pages = {1--9},
  publisher = {Sociedade Brasileira de Computação},
  doi = {10.5753/sbseg.2020.19230}
}
```

2019

Journal
High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions
Armando Faz-Hernández, Julio López, Ricardo Dahab
ACM Transactions on Mathematical Software, 2019
DOI Details
Elliptic curve cryptosystems are considered an efficient alternative to conventional systems such as DSA and RSA. Recently, Montgomery and Edwards elliptic curves have been used to implement cryptosystems. In particular, the elliptic curves Curve25519 and Curve448 were used for instantiating Diffie-Hellman protocols named X25519 and X448. Mapping these curves to twisted Edwards curves allowed deriving two new signature instances, called Ed25519 and Ed448, of the Edwards Digital Signature Algorithm. In this work, we focus on the secure and efficient software implementation of these algorithms using SIMD parallel processing. We present software techniques that target the Intel AVX2 vector instruction set for accelerating prime field arithmetic and elliptic curve operations. Our contributions result in a high-performance software library for AVX2-ready processors. For example, our library computes digital signatures 19 percent (for Ed25519) and 29 percent (for Ed448) faster than previous optimized implementations. Also, our library improves by 10 percent and 20 percent the execution time of X25519 and X448, respectively.
```
@article{faz_toms2019,
  author = {Armando Faz-Hernández and
            Julio López and
            Ricardo Dahab},
  title = {High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions},
  journal = {ACM Transactions on Mathematical Software},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  issn = {0098-3500},
  volume = {45},
  number = {3},
  pages = {1--35},
  year = {2019},
  month = {jul},
  doi = {10.1145/3309759}
}
```

2018

Journal
A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol
Armando Faz-Hernández, Julio López, Eduardo Ochoa-Jiménez, Francisco Rodríguez-Henríquez
IEEE Transactions on Computers, 2018
DOI e-print Details
Since its introduction by Jao and De Feo in 2011, the supersingular isogeny Diffie-Hellman (SIDH) key exchange protocol has positioned itself as a promising candidate for post-quantum cryptography. One salient feature of the SIDH protocol is that it requires exceptionally short key sizes. However, the latency associated to SIDH is higher than the ones reported for other post-quantum cryptosystem proposals. Aiming to accelerate the SIDH runtime performance, we present in this work several algorithmic optimizations targeting both elliptic-curve and field arithmetic operations. We introduce in the context of the SIDH protocol a more efficient approach for calculating the elliptic curve operation P+[k]Q. Our strategy achieves a factor 1.4 speedup compared with the popular variable-three-point ladder algorithm regularly used in the SIDH shared secret phase. Moreover, profiting from pre-computation techniques our algorithm yields a factor 1.7 acceleration for the computation of this operation in the SIDH key generation phase. We also present an optimized evaluation of the point tripling formula, and discuss several algorithmic and implementation techniques that lead to faster field arithmetic computations. A software implementation of the above improvements on an Intel Skylake Core i7-6700 processor gives a factor 1.33 speedup against the state-of-the-art software implementation of the SIDH protocol reported by Costello-Longa-Naehrig in CRYPTO 2016.
```
@article{faz_tc2018,
  author = {Armando Faz-Hernández and
            Julio López and
            Eduardo Ochoa-Jiménez and
            Francisco Rodríguez-Henríquez},
  title = {A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol},
  journal = {IEEE Transactions on Computers},
  volume = {67},
  number = {11},
  pages = {1622--1636},
  year = {2018},
  month = {nov},
  doi = {10.1109/TC.2017.2771535}
}
```
Conference
SoK: A Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures
Armando Faz-Hernández, Julio López, Ana Karina D. S. de Oliveira
Proceedings of the 5th ACM on ASIA Public-Key Cryptography Workshop at Asia CCS, Incheon, Republic of Korea, 2018
DOI Details
The latest processors have included extensions to the instruction set architecture tailored to speed up the execution of cryptographic algorithms. Like the AES New Instructions (AES-NI) that target the AES encryption algorithm, the release of the SHA New Instructions (SHA-NI), designed to support the SHA-256 hash function, introduces a new scenario for optimizing cryptographic software. In this work, we present a performance evaluation of several cryptographic algorithms, hash-based signatures and data encryption, on platforms that support AES-NI and/or SHA-NI. In particular, we revisited several optimization techniques targeting multiple-message hashing, and as a result, we reduce by 21 percent the running time of this task by means of a pipelined SHA-NI implementation. In public-key cryptography, multiple-message hashing is one of the critical operations of the XMSS and XMSS^MT post-quantum hash-based digital signatures. Using SHA-NI extensions, signatures are computed 4x faster; however, our pipelined SHA-NI implementation increased this speedup factor to 4.3x. For symmetric cryptography, we revisited the implementation of AES modes of operation and reduced by 12 percent and 7 percent the running time of CBC decryption and CTR encryption, respectively.
```
@inproceedings{faz_apkc_asiaccs2018,
  author = {Armando Faz-Hernández and
            Julio López and
            Ana Karina D. S. de Oliveira},
  title = {SoK: A Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures},
  booktitle = {Proceedings of the 5th ACM on ASIA Public-Key Cryptography Workshop at Asia CCS, Incheon, Republic of Korea},
  pages = {9--18},
  year = {2018},
  month = {jun},
  doi = {10.1145/3197507.3197511},
  isbn = {9781450357562},
  publisher = {Association for Computing Machinery},
  address = {Incheon, Republic of Korea},
  series = {APKC'18}
}
```

2017

Conference
How to (Pre-)Compute a Ladder - Improving the Performance of X25519 and X448
Thomaz Oliveira, Julio López, Hüseyin Hisil, Armando Faz-Hernández, Francisco Rodríguez-Henríquez
Selected Areas in Cryptography - SAC 2017 - 24th International Conference, Ottawa, ON, Canada, August 16-18, 2017., 2017
DOI e-print Details
In the RFC 7748 memorandum, the Internet Research Task Force specified a Montgomery-ladder scalar multiplication function based on two recently adopted elliptic curves, “curve25519” and “curve448”. The purpose of this function is to support the Diffie-Hellman key exchange algorithm that will be included in the forthcoming version of the Transport Layer Security cryptographic protocol. In this paper, we describe a ladder variant that permits to accelerate the fixed-point multiplication function inherent to the Diffie-Hellman key pair generation phase. Our proposal combines a right-to-left version of the Montgomery ladder along with the pre-computation of constant values directly derived from the base-point and its multiples. To our knowledge, this is the first proposal of a Montgomery ladder procedure for prime elliptic curves that admits the extensive use of pre-computation. In exchange of very modest memory resources and a small extra programming effort, the proposed ladder obtains significant speedups for software implementations. Moreover, our proposal fully complies with the RFC 7748 specification. A software implementation of the X25519 and X448 functions using our pre-computable ladder yields an acceleration factor of roughly 1.20, and 1.25 when implemented on the Haswell and the Skylake micro-architectures, respectively.
```
@inproceedings{oliveira_sac2017,
  author = {Thomaz Oliveira and
            Julio López and
            Hüseyin Hisil and
            Armando Faz-Hernández and
            Francisco Rodríguez-Henríquez},
  title = {How to (Pre-)Compute a Ladder - Improving the Performance of X25519 and X448},
  booktitle = {Selected Areas in Cryptography - SAC 2017 - 24th International Conference, Ottawa, ON, Canada, August 16-18, 2017.},
  pages = {172--191},
  year = {2017},
  month = {aug},
  editor = {Adams, Carlisle and Camenisch, Jan},
  publisher = {Springer International Publishing},
  address = {Ottawa, Canada},
  isbn = {978-3-319-72565-9},
  doi = {10.1007/978-3-319-72565-9_9}
}
```
Conference
A Secure and Efficient Implementation of the Quotient Digital Signature Algorithm (qDSA)
Armando Faz-Hernández, Hayato Fujii, Diego F. Aranha, Julio López
Security, Privacy, and Applied Cryptography Engineering - 7th International Conference, SPACE 2017, Goa, India, December 13-17, 2017, Proceedings, 2017
DOI Details
Digital signatures provide a means to publicly authenticate messages sent over an insecure channel. Recently, the Quotient Digital Signature Algorithm (qDSA) was introduced aiming key-compatibility with the Diffie-Hellman X25519 function. Due to the novelty of qDSA, there remains a need for an optimized implementation that allows identifying the real impact of this new algorithm. In this work, we focus on the secure and efficient implementation of qDSA. By leveraging the use of precomputation on the right-to-left Joye’s algorithm, we reduced the running time of signature generation by 30–35 percent, and the running time of the verification procedure by 19 percent. In addition, for increased security, we show a verification method that validates qDSA signatures unequivocally. All of these improvements were included into an optimized software library targeting 32–bit ARM and 64–bit Intel architectures. The improved performance achieved in these platforms, it positions qDSA as a competitive alternative for deploying digital signatures efficiently and securely.
```
@inproceedings{faz_space2017,
  author = {Armando Faz-Hernández and
            Hayato Fujii and
            Diego F. Aranha and
            Julio López},
  title = {A Secure and Efficient Implementation of the Quotient Digital Signature Algorithm (qDSA)},
  booktitle = {Security, Privacy, and Applied Cryptography Engineering - 7th International Conference, SPACE 2017, Goa, India, December 13-17, 2017, Proceedings},
  address = {Goa, India},
  pages = {170--189},
  year = {2017},
  month = {dec},
  isbn = {978-3-319-71501-8},
  editor = {Ali, Sk Subidh
            and Danger, Jean-Luc
            and Eisenbarth, Thomas},
  publisher = {Springer International Publishing},
  doi = {10.1007/978-3-319-71501-8_10}
}
```

2016

Conference
Speeding up Elliptic Curve Cryptography on the P-384 Curve
Armando Faz-Hernández, Julio López
Anais do XVI Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais, 2016
DOI Details
Honorable Mention Award
The P-384 is one of the standardized elliptic curves by ANSI and NIST. This curve provides a 192-bit security level and is used in the computation of digital signatures and key-agreement protocols. Although several publicly-available cryptographic libraries support the P-384 curve, they have a poor performance. In this work, we present software techniques for accelerating cryptographic operations using the P-384 curve; first, we use the latest vector instructions of Intel processors to implement the prime field arithmetic; second, we devise a parallel scheduling of the complete formulas for point addition law. As a result, on Skylake micro-architecture, our software implementation is 15 percent and 40 percent faster than the OpenSSL library for computing ECDSA signatures and the ECDH protocol, respectively.
```
@inproceedings{faz_sbseg2016,
  author = {Armando Faz-Hernández and Julio López},
  title = {Speeding up Elliptic Curve Cryptography on the P-384 Curve},
  booktitle = {Anais do XVI Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais},
  location = {Niterói, RJ, Brasil},
  publisher = {Sociedade Brasileira de Computação},
  address = {Porto Alegre, RS, Brasil},
  year = {2016},
  month = {nov},
  pages = {170--183},
  doi = {10.5753/sbseg.2016.19306}
}
```

2015

Journal
Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves (extended version)
Armando Faz-Hernández, Patrick Longa, Ana Helena Sánchez
Journal of Cryptographic Engineering, 2015
DOI e-print Details
We propose efficient algorithms and formulas that improve the performance of side channel protected elliptic curve computations with special focus on scalar multiplication exploiting the Gallant–Lambert–Vanstone (CRYPTO 2001) and Galbraith–Lin–Scott (EUROCRYPT 2009) methods. Firstly, by adapting Feng et al.’s recoding to the GLV setting, we derive new regular algorithms for variable-base scalar multiplication that offer protection against simple side-channel and timing attacks. Secondly, we propose an efficient, side-channel protected algorithm for fixed-base scalar multiplication which combines Feng et al.’s recoding with Lim-Lee’s comb method. Thirdly, we propose an efficient technique that interleaves ARM and NEON-based multiprecision operations over an extension field to improve performance of GLS curves on modern ARM processors. Finally, we showcase the efficiency of the proposed techniques by implementing a state-of-the-art GLV–GLS curve in twisted Edwards form defined over Fp^2, which supports a four-dimensional decomposition of the scalar and is fully protected against timing attacks. Analysis and performance results are reported for modern 64 and ARM processors. For instance, we compute a variable-base scalar multiplication in 89,000 and 244,000 cycles on an Intel Ivy Bridge and an ARM Cortex-A15 processor (respect.); using a precomputed table of 6KB, we compute a fixed-base scalar multiplication in 49,000 and 116,000 cycles (respect.); and using a precomputed table of 3KB, we compute a double-scalar multiplication in 115,000 and 285,000 cycles (respect.). The proposed techniques represent an important improvement of the state-of-the-art performance of elliptic curve computations, and allow us to set new speed records in several modern processors. The techniques also reduce the cost of adding protection against timing attacks in the computation of GLV-based variable-base scalar multiplication to below 10 percent. This work is the extended version of a publication that appeared at CT-RSA (Faz-Hernández et al. Topics in Cryptology, CT-RSA 2014, vol. 8366, pp. 1–27 2014).
```
@article{faz_jcen2015,
  author = {Armando Faz-Hernández and
            Patrick Longa and
            Ana Helena Sánchez},
  title = {Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves (extended version)},
  journal = {Journal of Cryptographic Engineering},
  volume = {5},
  number = {1},
  pages = {31--52},
  year = {2015},
  month = {apr},
  doi = {10.1007/S13389-014-0085-7}
}
```
Chapter
Implementação Eficiente e Segura de Algoritmos Criptográficos
Armando Faz-Hernández, Roberto Cabral, Diego F. Aranha, Julio López
Minicursos do Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, 2015
DOI Website Details
Software implementation of a cryptographic algorithm is not an easy task even for advanced programmers, because it requires a careful knowledge not only about algorithms but also of the target architecture. In this tutorial, we will describe some techniques to produce an efficient and secure software implementation. For the sake of efficiency, we will detail how advanced vector instruction sets accelerate the execution of the following cryptographic algorithms: the AES encryption algorithm, the SHA-3 cryptographic hash function and the key agreement protocol based on the elliptic curve Curve25519. Focusing on the secure software development, we will illustrate some implementations that are vulnerable against side-channel attacks; also we will present some countermeasures that mitigate such attacks thereby preventing leakage of secret information
```
@incollection{faz_sbseg2015,
  author = {Armando Faz-Hernández and Roberto Cabral and Diego F. Aranha and Julio López},
  title = {Implementação Eficiente e Segura de Algoritmos Criptográficos},
  booktitle = {Minicursos do Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais},
  publisher = {Sociedade Brasileira de Computação},
  address = {Florianópolis, SC, Brasil},
  editor = {Eduardo Souto and Michelle Wangham and Joni Fraga},
  chapter = {3},
  doi = {10.5753/sbc.9004.8.3},
  isbn = {978-85-7669-304-8},
  language = {pt},
  year = {2015},
  month = {nov},
  pages = {93--140},
  volume = {183}
}
```
Conference
Fast Implementation of Curve25519 Using AVX2
Armando Faz-Hernández, Julio López
Progress in Cryptology - LATINCRYPT 2015 - 4th International Conference on Cryptology and Information Security in Latin America, Guadalajara, Mexico, August 23-26, 2015, Proceedings, 2015
DOI Details
AVX2 is the newest instruction set on the Intel Haswell processor that provides simultaneous execution of operations over vectors of 256 bits. This work presents the advances on the applicability of AVX2 on the development of an efficient software implementation of the elliptic curve Diffie-Hellman protocol using the Curve25519 elliptic curve. Also, we will discuss some advantages that vector instructions offer as an alternative method to accelerate prime field and elliptic curve arithmetic. The performance of our implementation shows a slight improvement against the fastest state-of-the-art implementations.
```
@inproceedings{faz_latincrypt2015,
  author = {Armando Faz-Hernández and
            Julio López},
  title = {Fast Implementation of Curve25519 Using AVX2},
  booktitle = {Progress in Cryptology - LATINCRYPT 2015 - 4th International Conference
               on Cryptology and Information Security in Latin America, Guadalajara,
               Mexico, August 23-26, 2015, Proceedings},
  pages = {329--345},
  year = {2015},
  month = {aug},
  doi = {10.1007/978-3-319-22174-8_18},
  address = {Cham},
  editor = {Lauter, Kristin and Rodríguez-Henríquez, Francisco},
  isbn = {978-3-319-22174-8},
  publisher = {Springer International Publishing}
}
```

2014

Conference
Efficient and Secure Algorithms for GLV-Based Scalar Multiplication and Their Implementation on GLV-GLS Curves
Armando Faz-Hernández, Patrick Longa, Ana Helena Sánchez
Topics in Cryptology - CT-RSA 2014 - The Cryptographer's Track at the RSA Conference 2014, San Francisco, CA, USA, February 25-28, 2014. Proceedings, 2014
DOI Details
We propose efficient algorithms and formulas that improve the performance of side-channel protected scalar multiplication exploiting the Gallant-Lambert-Vanstone (CRYPTO 2001) and Galbraith-Lin-Scott (EUROCRYPT 2009) methods. Firstly, by adapting Feng et al.’s recoding to the GLV setting, we derive new regular algorithms for variable-base scalar multiplication that offer protection against simple side-channel and timing attacks. Secondly, we propose an efficient technique that interleaves ARM-based and NEON-based multiprecision operations over an extension field, as typically found on GLS curves and pairing computations, to improve performance on modern ARM processors. Finally, we showcase the efficiency of the proposed techniques by implementing a state-of-the-art GLV-GLS curve in twisted Edwards form defined over Fp^2, which supports a four dimensional decomposition of the scalar and runs in constant time, i.e., it is fully protected against timing attacks. For instance, using a precomputed table of only 512 bytes, we compute a variable-base scalar multiplication in 92,000 cycles on an Intel Ivy Bridge processor and in 244,000 cycles on an ARM Cortex-A15 processor. Our benchmark results and the proposed techniques contribute to the improvement of the state-of-the-art performance of elliptic curve computations. Most notably, our techniques allow us to reduce the cost of adding protection against timing attacks in the GLV-based variable-base scalar multiplication computation to below 10 percent.
```
@inproceedings{faz_ctrsa2014,
  author = {Armando Faz-Hernández and
            Patrick Longa and
            Ana Helena Sánchez},
  title = {Efficient and Secure Algorithms for GLV-Based Scalar Multiplication and Their Implementation on GLV-GLS Curves},
  booktitle = {Topics in Cryptology - CT-RSA 2014 - The Cryptographer's Track at the RSA Conference 2014, San Francisco, CA, USA, February 25-28, 2014. Proceedings},
  pages = {1--27},
  year = {2014},
  month = {feb},
  doi = {10.1007/978-3-319-04852-9_1},
  address = {Cham},
  editor = {Benaloh, Josh},
  isbn = {978-3-319-04852-9},
  publisher = {Springer International Publishing}
}
```
Abstract
On Software Implementation of Arithmetic Operations on Prime Fields using AVX2
Armando Faz-Hernández, Julio López
Anais do XIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, 2014
DOI Details
AVX2 is the newest instruction set on Intel Haswell processor that provides simultaneous execution of operations over vectors of data. This work presents the advances on the applicability of AVX2 on the development of prime field arithmetic, which is a building block for the construction of Elliptic Curve Cryptosystems. Having as a goal the efficient and secure implementation of prime field arithmetic, we show some advantages that vector instructions offer compared against 64-bit implementations. In order to validate the results of our research, we present a benchmark obtained on a Haswell processor.
```
@inproceedings{faz_sbseg2014,
  author = {Armando Faz-Hernández and Julio López},
  title = {On Software Implementation of Arithmetic Operations on Prime Fields using {AVX2}},
  booktitle = {Anais do XIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais},
  location = {Belo Horizonte},
  publisher = {Sociedade Brasileira de Computação},
  address = {Porto Alegre, RS, Brasil},
  pages = {338--341},
  year = {2014},
  month = {nov},
  doi = {10.5753/sbseg.2014.20148}
}
```

2013

Unpublished
Yet Another Survey on SIMD Instructions
Armando Faz-Hernández
2013
PDF Details
Data level parallelism approach enables a potential speed-up on applications that works with (almost) independent data sets. Recently, parallel processing has been growing and has being commonly used in different applications, such as image/video processing, visualization, scientific computation, etcetera. The latest processor’s micro-architectures come equipped with special units that process an operation over a set of data, on a SIMD fashion. In this document, we research main features offered by different instruction sets that perform parallel operations. We focus on various instruction sets designed for either desktop/server computers such as: SSEx, AVX and FMA, and also for mobile and embedded computers with ARM-based processors, which supports the NEON instruction set.
```
@unpublished{faz_yassimd2013,
  author = {Armando Faz-Hernández},
  title = {Yet Another Survey on SIMD Instructions},
  pages = {1-8},
  year = {2013},
  month = {jun}
}
```

2012

Thesis
Implementación multinúcleo de la multiplicación escalar en curvas de Koblitz
Armando Faz-Hernández
2012 · Master's Thesis
HTML Details
Elliptic curve cryptography has a high significance on secure computer applications, it provides mechanisms to ensure privacy on data, authentication among communicating entities, as well as the integrity of a message sent by an insecure channel. Nowadays, there are cryptographic algorithms that ensure these security services, however, some of them require a large amount of computer processing. Such is the case of the scalar multiplication, which is a fundamental operation for the implementation of elliptic curve cryptography. It is, therefore, essential that this operation be performed efficiently. This thesis has focused on the analysis of algorithms and programming techniques to reduce the computation time of the scalar multiplication. From the algorithmic standpoint, Koblitz elliptic curves allow that the computation of the scalar multiplication can be quickly performed by applying the Frobenius's endomorphism, without using point doublings. The formulation of a parallel algorithm allows its implementation in a multicore processor. Extended instruction sets included in the latest computer architectures enable parallel processing of multiple data sets. Within these sets, the use of the carry-less multiplier enhances the performance of operations over finite fields, thereby resulting in acceleration of computation of scalar multiplication. The results of this research show the speedup in the parallelization of the scalar multiplication, optimizing both algorithmically and with the use of recent technologies.
```
@mastersthesis{faz_master_thesis,
  author = {Armando Faz-Hernández},
  title = {Implementación multinúcleo de la multiplicación escalar en curvas de Koblitz},
  school = {Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV)},
  address = {Distrito Federal, Mexico},
  type = {Master's thesis},
  year = {2012},
  month = {feb},
  keywords = {carryless multiplier, elliptic curves, binary fields, PCLMULQDQ, Koblitz curves},
  language = {es},
  note = {https://repositorio.cinvestav.mx/handle/cinvestav/4271}
}
```
Conference
Faster Implementation of Scalar Multiplication on Koblitz Curves
Diego F. Aranha, Armando Faz-Hernández, Julio López, Francisco Rodríguez-Henríquez
Progress in Cryptology - LATINCRYPT 2012 - 2nd International Conference on Cryptology and Information Security in Latin America, Santiago, Chile, October 7-10, 2012. Proceedings, 2012
DOI e-print Details
We design a state-of-the-art software implementation of field and elliptic curve arithmetic in standard Koblitz curves at the 128-bit security level. Field arithmetic is carefully crafted by using the best formulae and implementation strategies available, and the increasingly common native support to binary field arithmetic in modern desktop computing platforms. The i-th power of the Frobenius automorphism on Koblitz curves is exploited to obtain new and faster interleaved versions of the well-known τNAF scalar multiplication algorithm. The usage of the τ^(m/3) and τ^(m/4) maps are employed to create analogues of the 3-and 4-dimensional GLV decompositions and in general, the ⌊m/s⌋-th power of the Frobenius automorphism is applied as an analogue of an s-dimensional GLV decomposition. The effectiveness of these techniques is illustrated by timing the scalar multiplication operation for fixed, random and multiple points. In particular, our library is able to compute a random point scalar multiplication in just below 10^5 clock cycles, which sets a new speed record across all curves with or without endomorphisms defined over binary or prime fields. The results of our optimized implementation suggest a trade-off between speed, compliance with the published standards and side-channel protection. Finally, we estimate the performance of curve-based cryptographic protocols instantiated using the proposed techniques and compare our results to related work.
```
@inproceedings{aranha_latincrypt2012,
  author = {Diego F. Aranha and
            Armando Faz-Hernández and
            Julio López and
            Francisco Rodríguez-Henríquez},
  title = {Faster Implementation of Scalar Multiplication on Koblitz Curves},
  booktitle = {Progress in Cryptology - LATINCRYPT 2012 - 2nd International Conference
               on Cryptology and Information Security in Latin America, Santiago,
               Chile, October 7-10, 2012. Proceedings},
  pages = {177--193},
  year = {2012},
  month = {oct},
  doi = {10.1007/978-3-642-33481-8_10},
  address = {Berlin, Heidelberg},
  editor = {Hevia, Alejandro and Neven, Gregory},
  isbn = {978-3-642-33481-8},
  publisher = {Springer Berlin Heidelberg}
}
```

2011

Journal
Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction
Jonathan Taverne, Armando Faz-Hernández, Diego F. Aranha, Francisco Rodríguez-Henríquez, Darrel Hankerson, Julio López
Journal of Cryptographic Engineering, 2011
DOI Details
The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112- and 192-bit security levels and a new speed record for side-channel-resistant scalar multiplication in a random curve at the 128-bit security level. The algorithms presented in this work were implemented on Westmere and Sandy Bridge processors, the latest generation Intel microarchitectures.
```
@article{taverne_jcen2011,
  author = {Jonathan Taverne and
            Armando Faz-Hernández and
            Diego F. Aranha and
            Francisco Rodríguez-Henríquez and
            Darrel Hankerson and
            Julio López},
  title = {Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction},
  journal = {Journal of Cryptographic Engineering},
  volume = {1},
  number = {3},
  pages = {187--199},
  year = {2011},
  month = {sep},
  doi = {10.1007/S13389-011-0017-8}
}
```
Conference
Software Implementation of Binary Elliptic Curves: Impact of the Carry-Less Multiplier on Scalar Multiplication
Jonathan Taverne, Armando Faz-Hernández, Diego F. Aranha, Francisco Rodriguez-Henríquez, Darrel Hankerson, Julio López
Cryptographic Hardware and Embedded Systems - CHES 2011 - 13th International Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings, 2011
DOI e-print Details
The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112- and 192-bit security levels, and a new speed record for side-channel resistant scalar multiplication in a random curve at the 128-bit security level.
```
@inproceedings{taverne_ches2011,
  author = {Jonathan Taverne and
            Armando Faz-Hernández and
            Diego F. Aranha and
            Francisco Rodriguez-Henríquez and
            Darrel Hankerson and
            Julio López},
  title = {Software Implementation of Binary Elliptic Curves: Impact of the Carry-Less Multiplier on Scalar Multiplication},
  booktitle = {Cryptographic Hardware and Embedded Systems - CHES 2011 - 13th International Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings},
  pages = {108--123},
  year = {2011},
  month = {sep},
  doi = {10.1007/978-3-642-23951-9_8},
  editor = {Preneel, Bart and Takagi, Tsuyoshi},
  isbn = {978-3-642-23951-9},
  publisher = {Springer Berlin Heidelberg},
  address = {Berlin, Heidelberg}
}
```