最近在搞邮局。有个很奇怪的问题,就是打开mbox的文件,比如说:
/var/spool/mail/root
里面信件的部分有奇怪的3D字符:
<table cellpadding=3D"0" cellspacing=3D"0" style=3D"text-align:le=
ft;color:#454545;background-color:#fff;font-size:14px;border-radius:10px;pa=
注意,中间多了若干个3D,最后也多了=号
这是什么鬼呢?
搜了一圈,原来这个是quoted-printable编解码,跟Base64类似,base64和quoted-printable这两种编码都是在电子邮件中常见的编码。
基本知识:
如果=号出现在一行最后,表示换行,那么:
he=
llo
意思就是连起来的hello如果中间出现=3D,那就是一个=号的意思
所以style=3D"text"意思就是style="text"英文字符除了=以外不做处理,其他字符的编码为=加这个字符的两个字节的16进制数。
弄明白了吧。
给一段处理mbox的python程序,可以用来读邮件:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import mailbox
import base64
import os
import sys
import email
import quopri
filename = "/var/spool/mail/zrr"
mb = mailbox.mbox(filename)
nmes = len(mb)
for i in range(len(mb)):
print "\n\n\n\n\n"
print "-------------------------------------------------------------------------------------------------"
print "Email", i
print "-------------------------------------------------------------------------------------------------"
mes = mb.get_message(i)
em = email.message_from_string(mes.as_string())
subject = em.get('Subject')
if subject.find('=?') != -1:
ll = email.header.decode_header(subject)
subject = ""
for l in ll:
subject = subject + l[0]
em_from = em.get('From')
if em_from.find('=?') != -1:
ll = email.header.decode_header(em_from)
em_from = ""
for l in ll:
em_from = em_from + l[0]
print "From: %s - Subject: %s" %(em_from, subject)
print "-------------------------------------------------------------------------------------------------"
if mes.is_multipart():
for part in mes.get_payload():
print quopri.decodestring(part.get_payload())
else:
print quopri.decodestring(mes.get_payload())