Email GMail google Intelwars machinelearning Malware Phishing

Deep Learning to Find Malicious Email Attachments

Google presented its system of using deep-learning techniques to identify malicious email attachments:

At the RSA security conference in San Francisco on Tuesday, Google’s security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It’s challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous. Google says that 63 percent of the malicious documents it blocks each day are different than the ones its systems flagged the day before. But this is exactly the type of pattern-recognition problem where deep learning can be helpful.


The document analyzer looks for common red flags, probes files if they have components that may have been purposefully obfuscated, and does other checks like examining macros­ — the tool in Microsoft Word documents that chains commands together in a series and is often used in attacks. The volume of malicious documents that attackers send out varies widely day to day. Bursztein says that since its deployment, the document scanner has been particularly good at flagging suspicious documents sent in bursts by malicious botnets or through other mass distribution methods. He was also surprised to discover how effective the scanner is at analyzing Microsoft Excel documents, a complicated file format that can be difficult to assess.

This is the sort of thing that’s pretty well optimized for machine-learning techniques.