Jul 5, 2021 10:00 AM

Echo Dots Store a Wealth of Data—Even After You Reset Them

Thinking about selling your smart speaker? Be aware that you can't completely delete personal content from the device.

All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.

Like most Internet-of-things devices these days, Amazon’s Echo Dot gives users a way to perform a factory reset so that, as the corporate behemoth says, users can “remove any ... personal content from the applicable device(s)” before selling or discarding them. But researchers have recently found that the digital bits that remain on these reset devices can be reassembled to retrieve a wealth of sensitive data, including passwords, locations, authentication tokens, and other things.

Most IoT devices, the Echo Dot included, use NAND-based flash memory to store data. Like traditional hard drives, NAND—which is short for the boolean operator "not and"—stores bits of data so they can be recalled later. But whereas hard drives write data to magnetic platters, NAND uses silicon chips. NAND is also less stable than hard drives because reading and writing to it produces bit errors that must be corrected using error-correcting code.

NAND is usually organized in planes, blocks, and pages. This design allows for a limited number of erase cycles, usually in the neighborhood of 10,000 to 100,000 times per block. To extend the life of the chip, blocks storing deleted data are often invalidated rather than wiped. True deletions usually happen only when most of the pages in a block are invalidated. This process is known as wear-leveling.

Researchers from Northeastern University bought 86 used devices on eBay and at flea markets over a span of 16 months. They first examined the purchased devices to see which ones had been factory reset and which hadn’t. Their first surprise: 61 percent of them had not been reset. Without a reset, recovering the previous owners' Wi-Fi passwords, router MAC addresses, Amazon account credentials, and information about connected devices was relatively easy.

The next surprise came when the researchers disassembled the devices and forensically examined the contents stored in their memory.

“An adversary with physical access to such devices (e.g., purchasing a used one) can retrieve sensitive information such as Wi-Fi credentials, the physical location of (previous) owners, and cyber-physical devices (e.g., cameras, door locks),” the researchers wrote in a research paper. “We show that such information, including all previous passwords and tokens, remains on the flash memory, even after a factory reset.”

Used Echo Dots and other Amazon devices can come in a variety of states. One state is the device remains provisioned, as the 61 percent of purchased Echo Dots were. The devices can be reset while they are connected to the previous owner’s Wi-Fi network, reset while disconnected from Wi-Fi, either with or without deleting the device from the owner’s Alexa app.

Depending on the type of NAND flash and the state of the previously owned device, the researchers used several techniques to extract the stored data. For reset devices, there’s a process known as chip-off, which involves disassembling the device and desoldering the flash memory. The researchers then use an external device to access and extract the flash contents. This method requires a fair amount of equipment, skill, and time.

A different process called in-system programming allows the researchers to access the flash without desoldering it. It works by scratching some of the solder mask coating off of the printed circuit board and attaching a conductive needle to an exposed piece of copper to tap into the signal trace, which connects the flash to the CPU.

The researchers also created a hybrid chip-off method that causes less damage and thermal stress to the PCB and the embedded multi-chip package. These defects can cause short circuiting and breakage of PCB pads. The hybrid technique uses a donor multi-chip package for the RAM and the embedded multi media card portion of the original multi-chip package externally. This method is mostly interesting to researchers who want to analyze IoT devices.

In addition to the 86 used devices, the researchers bought six new Echo Dot devices and, over a span of several weeks, provisioned them with test accounts at different geographic locations and different Wi-Fi access points. The researchers paired the provisioned devices to different smart home and Bluetooth devices. The researchers then extracted the flash contents from these still-provisioned devices using the techniques described earlier.

After extracting the flash contents from their six new devices, they used the Autospy forensic tool to search embedded multimedia card images. They analyzed NAND dumps manually. They found the name of the Amazon account owner multiple times, along with the complete contents of the wpa_supplicant.conf file, which stores a list of networks the devices have previously connected to, along with the encryption key they used. Recovered log files also provided lots of personal information.

Because the researchers provisioned the devices themselves, they knew what kinds of information the devices stored. They used this knowledge to create a list of keywords to locate specific types of data in four categories: information about the owner, Wi-Fi–related data, information about paired devices, and geographic information. Knowing what kinds of data are on the device can be helpful, but it’s not necessary for carrying out the attack.

After dumping and analyzing the recovered data, the researchers reassembled the devices. The researchers wrote:

Our assumption was that the device would not require an additional setup when connected at a different location and Wi-Fi access point with a different MAC address. We confirmed that the device connected successfully, and we were able to issue voice commands to the device. When asked “Alexa, Who am I?” the device would return the previous owner’s name. The reconnection to the spoofed access point did not produce a notice in the Alexa app nor a notification by email. The requests are logged under “Activity” in the Alexa app, but they can be deleted via voice commands. We were able to control smart-home devices, query package delivery dates, create orders, get music lists and use the “drop-in” feature. If a calendar or contact list was linked to the Amazon account, it was also possible to access it. The exact amount of functionality depends on the features and skills the previous owner had used. Before and after a factory reset the raw NAND flash was extracted from our provisioned devices using the Chip-Off method. Additionally, we created a dump using the eMMC interface. To find information in the resulting dumps, we had to develop a method to identify interesting information.

Dennis Giese, one of the Northeastern University researchers who wrote the paper, expanded on the attack scenario in an email, writing:

One of the queries is “Alexa, Who am I,” and the device will tell the owner's name. All services that the previous owner used are accessible. For example, you can manage your calendar through the Echo. Also, the Echo will get notifications when packages are about to arrive or you can use the Drop-In feature (as in, talking to another Echo of yours). If someone does not use any smart-home devices, then you obviously cannot control them. One special thing is door locks, where, by default, Alexa only allows you to lock them. A user needs to manually allow Alexa to enable the unlock feature … which, to our knowledge, only works through the app. So if a user did not enable that feature, you cannot open doors.

While the Echo Dot wouldn’t provide the previous owner’s address through voice commands, the researchers were able to find the rough location by asking questions about nearby restaurants, grocery stores, and public libraries. In some of the experiments, locations were accurate down to 150 meters. In some cases—such as when the device user had multiple Wi-Fi routers or neighbors’ SSID names were stored—the researchers could use the Google localization API, which is more precise still.

When Echo Dots were reset, the data extraction required more sophistication. In the event the reset was done when the device was disconnected from the owner’s Wi-Fi network and the user didn’t delete the device from their Alexa app, the recovered data included the authentication token needed to connect to the associated Amazon account. From there, the researchers could do the same things possible with non-reset devices, as described earlier.

When devices were reset while connected to the Wi-Fi network or had been deleted from the Alexa app, the researchers could no longer access the associated Amazon account, but in most cases they could still obtain Wi-Fi SSID names and passwords and MAC addresses of the connected router. With those two pieces of information, it’s usually possible to learn the rough location of the device using search sites such as Wigle.

Giese summarized the results this way:

If a device has not been reset (as in 61 percent of the cases), then it's pretty simple: You remove the rubber on the bottom, remove four screws, remove the body, unscrew the PCB, remove a shielding and attach your needles. You can dump the device then in less than 5 minutes with a standard eMMC/SD Card reader. After you got everything, you reassemble the device (technically, you don't need to reassemble it as it will work as is), and you create your own fake Wi-Fi access point. And you can chat with Alexa directly after that.

If the device has been reset, it gets more tricky and will involve some soldering. You will at least get the Wi-Fi credentials and potentially the position of the Wi-Fi using the MAC address. In some rare cases, you might be able to connect it to the Amazon cloud and the previous owner's account. But that depends on the circumstances of the reset.

Ethical considerations prevented the researchers from performing experiments if they revealed personal information about the owner. The results of experiments they were able to do were consistent with the results from their six devices, and there’s no reason to believe they wouldn’t behave the same way. That means the 61 percent of used devices they bought held a wealth of personal information about the previous owner that was fairly easy for someone with modest means to extract.

The researchers also developed a privacy-preserving scheme to indicate when devices still stored this information. They didn’t save or use any of it to demonstrate additional attacks. They didn’t find any personal data on six additional Amazon-certified refurbished devices they obtained.

The researchers proposed several ways to better protect data from extraction on used devices. The most effective, they said, was to encrypt the user data partition. This mitigation would solve multiple problems.

First, a physical attack on a provisioned device cannot extract user data and credentials in a simple fashion anymore as a data dump would only contain encrypted information to which an attacker needs to retrieve the respective key first. This would protect the user credentials even if a reset was not possible nor performed. Second, most of the issues with wear-leveling are mitigated as all blocks are stored encrypted. The identification and reassembly of such blocks becomes very difficult. Also, the correct identification and reconstruction of traces of a deleted key is in our opinion not possible or very unlikely.

The researchers believe that the solution can be implemented in a firmware update and wouldn’t degrade performance for most devices. For devices that don’t have enough computing power, they can still encrypt Wi-Fi passwords, authentication tokens, and other data. That alternative isn’t as effective as encrypting the entire user partition, but it would still make data extraction much harder and more costly.

Encrypting the user data partition or sensitive data on it requires some accommodations for protecting the encryption key without hindering usability, Guevara Noubir, coauthor of the research paper, said in an email. For smartphones, encryption keys are protected with a PIN or password. But IoT devices like the Echo Dot are expected to work after a reboot without user interaction. Technical solutions exist, but they require some level of design and implementation effort.

Asked if Amazon was aware of the findings or disagreed with them, a company spokesperson wrote: “The security of our devices is a top priority. We recommend customers deregister and factory reset their devices before reselling, recycling, or disposing of them. It is not possible to access Amazon account passwords or payment card information, because that data is not stored on the device.”

On background, the spokesperson also noted points the researchers already made, specifically that:

The company is working on mitigations
The attacks require the attacker to have physical possession of a device and specialized training
For devices that are successfully reset while connected to the Internet, the information remaining in memory doesn’t give an adversary access to a user’s Amazon account
Amazon wipes any data remaining on devices available through Amazon trade-ins or returns

The threats demonstrated in the research most likely apply to Fire TV, Fire Tablets, and other Amazon devices, though the researchers didn’t test them. The results are also likely to apply to many other NAND-based devices that don’t encrypt user data, including the Google Home Mini.

Giese said that he believes Amazon is working on ways to better secure the data on the devices it manufactures. Until then, truly paranoid users who have no further use for their devices have little option than to physically destroy the NAND chip inside. For the rest, it’s important to perform a factory reset while the device is connected to the Wi-Fi access point where it was provisioned.

Giese said that resets don’t always work as expected, in part because it’s hard to differentiate between a Wi-Fi password reset (pressing reset for 15 seconds) and a factory reset (pressing reset for at least 25 seconds). He suggested that owners verify that the device was reset. For Echos, users can do this by power-cycling the device and seeing if it connects to the Internet or enters setup mode. Owners should also double-check that the device no longer appears in the Alexa app.

“While a reset still leaves data, you make it harder to extract the information (chip-off method) and invalidate the access of the device to your Amazon account,” he said. “Generally, and for all IoT devices, it might be a good idea to rethink if reselling it is really worth it. But obviously that might not be the best thing for the environment.”

This story originally appeared on Ars Technica.