2006-07-01

Code

Here is the (sanitised) code which I have used to get this data.

I use 'index.pl' with mod_perl on apache2 to generate addresses for harvesters; the decoder is fed mail logs in batches using logtail and cron, the regex is for postfix 2.2.


begin 644 spamtag.tar.gz
M'XL(`,;VID0``^U9;5/;2!+.U_A7]&%2MD&6WS$O2RK"%J"4L3E)AG!Q-B6D
M,=;&EAQ)AG`<]]NO9T:R96/#[@:RMQ5U*O)HIJ?[Z9[13'?CCXU18%P57KT@
M%9'JM1K[15K\9>U2L5HKU>J58JW^JE@JE4OU5U![25`13?S`\`!>>:X;/,;W
MU/C?E/QP_5M*0VYK\HOHH`N\M55=M?[E2K$2K7^Y6BSA^I?+I>HK*+X(F@7Z
MR=>_X8YO/?MJ$$#6S$&Y6*SE\;$E`-T8`V/B6.^N1H8]%$UWE$I)PR$P;A\\ 
MXA/OFEAB*J42R_8#S[Z<!+;K@.%8,/$)V`[X[L0S"?1=;R3`C1T,P/78KSL)
M8.1:=M\V#3I)`,,CJ3'Q1G80$`O&GGMM6]@(!D:`#RIC.'1O;.<*3->Q;#K)
MIY-@1(+=10P^N/U(N>E:R(3+C)`#`T%1:<:E>TV'(N,=-[!-(N"8[:>&*(D*
MB"MRK`44J,X<&O:(>.B!CG=E.+9OA+H]M-VRT8"),?2Y"12I[5R[0W09A"!\ 
M@EPH"E5-'-\=VJ:-MJ<(]3>;X(^)R5PT'-Y2EPSL2\H!?<\=H8OI7`H8+>T'
M-S@!D>C'B@9:YU`_EU09L'VJ=LZ4IMR$@PO0CV5H=$XO5.7H6(?C3JLIJQI(
M[2;VMG55.>CJ'>Q8DS2<N48'4E+[`N0/IZJL:=!103DY;2DH#*6K4EM79$T`
MI=UH=9M*^T@`%`#MC@XMY431D4WO"$QI."TUFP:=0SB1U<8QODH'2DO1+QB0
M0T5O4UV'J$R"4TG5E4:W):EPVE5/.YH,:%:JJ6B-EJ2<R$T1M:-&D,_DM@[:
ML=1J+5C9.6_+*H4^9^*!C!BE@Y9,%3$CFXHJ-W1JS:S50,<AO)8`VJG<4&A#
M_B"C+9)Z(80R-?F?763"06A*)]*1K*6R3W@$EZ315>43"AG=H'4/-%W1N[H,
M1YU.D_E9D]4S/)&UO52KHS%G=3590`VZQ!2C"/24MD?;!UU-83Y3VKJLJMU3
M7>FT<[B\Y^@5Q"CAU"9S;J?-3$4'==0+*I3Z@/E>@/-C&?M5ZD_F*8FZ0$./
M-?0X&^I#!^HQ&Z$M'[64([G=D.EHATHY5S0YATNE:)0!15*UYQ+J[#*3Z1(A
M*MY4M%2T806VD*`<@M0\4RCLD!F77E/";<)<UC@.W2VF_NHC]&]-T?V/!Q;Y
M)HZ'+Z&#Q7_UE?%?K5+=FM[_]5*-WO]U'$[N_Q]`Z7\4)KY7N+2=`MZ^0\C?
MZ*E4&OY06)#&"4LC`SKP;,%!&OY\>/`0R/=%"&GXPS$"A?"L84(:OB=0H'">
M.U9(P_-&"VEXIG@A#<\=,7!3GS%F2,.S1@UI>.:X@1O\')%#W-+GB!W2\%W1
M0XJ>1O1<,(,]UFX<*?#UIK"+]X)C&9Y5F';O[C8,;QR^>K?C8'?W`+_TONT/
MYCH;!XV]U#J;@)KUSR?2!PAI'XK1$&Y)NIT^=T];'0G7;Q]*>ZG1+:Q_Q:9#
M;JC&O12PKEMB>'NI+!Z[I"^L>C*N>$\.!5V-`CQ_LO210\V4!391U4X1@3#9
M>/L#S/!E,MA_3<PL'1"@*$"E3`51"7S&.WML6)8WG>&/AW:0+?3$@H#@\V\]
M,G(#\IGRY+A%]M@BYDS%.A?PL?@)-J!<V]K8J,!F"AY2Q%B:,I:1\36\CHV5
MIV,E.A8?JGS:BUE2BBSA<$+K37L\(![VSM8N_Q:]G[W+?"&W&=A_"YF-1;KH
M=%7VU0%^C"H>'+@%NRKN8=R^\@/NC8RPS+@893@*KBW:4CCK-3.',@R(844,
M'NY+=V1?9QZ7>I^+3*2618N\'YF<?TL<-L+<0WGG&2GO?\$O_*HR=<I95MS(
M%=9+!3\4BR'#;,(^7DUCP_R273O86!.F.G-3WH`N";V'*&\-\I#%?T/B7`6#
M;$Q4#G+P!L>CB1[.]&<[QY]<XJ>:C>NRG2!+/0+;N9P`^9@J(:Z7&3@'F3N#
M*Q#CYG#5;RXKY0A]I6P.#,\G`?]",#ZM5&M;]>T=X]+$[^QJ8/_V93ARW/%7
MSP\FU_0+PJ@&LFSOTV\>]FCCEWVHE'B3?H)HZATN(15_MV[?S\R+*40;;-RZ
MB/Y^AB7F]<Q*74N=&^FN,=VH/%S(OH?RJ&7%8@:=,<4QG<IQU!`'GT/AA9/"
MI=A>$Z:BIFSXG3$N9&,?8C2-G2O;C"UND;@?>B.<=X_&W4]/*1KM!"XWF[=W
M`\\8BQFV?#$ID'E'OAFC\9"(KG=%/?171]G_OQ3E?_@?G8Z!\`OD@(_G?Z52
M?5;_Q\2/UG\KU7HYR?]^!"WD?TGNE^1^2>Z7Y'X_:^[W5%;'(L,;0KY8QJW/
M0\@LAEPL+M?P9#1N,QBN\?<3-WPO\W=]0GS>4>$=Y\1RHJYJR#.8>&%/C?<<
M>C9_WPJU&,'$XSWU.;UAR/QFY#K!P(\"W"FZ]X8S,;P8O$-RZ84](<`3PS,'
M,WC2V+.',V@G<53O)PZ987H_&<;P2),K/$_Q?3O$1\8!&5UB^B+`#N_JF('+
M.TI%UM'&`S=D*9583Y.8O&?.+B/`>X3;12UB"1$F=M08UBX+S`S6K@C,!M:N
M"MP`VJX)##]K;['VD+7K`L/.VML"P\W:.P(#S'5AW$K!\A>J&7'REW+NJ902
M?F1.N3*I?&K:GT@UI[GF#TN:PK1I=<)TSQ+]:=I$;\#?5T2).)\HI$1LL6+*
MS<`>DNPO;UEJE>:7KD>NR#?`QFAB#L!$WV(@0Z][?KT[Y!M&$@X113%E]W%3
M%\:N'_3M;[V"/PK&EKA);W%Q,W![^[U?/AKY?TOY?WVZJ]SWQ.S'C[O?+/O*
M#G8_?;JK;M__YV,QOV/DKZ7\&7+LW.=Z41;4$QT2%'C*Q[7\FNW=;.9Z_F:V
M9VWF@#Y[N]$C)VXNHNA]9"SBW"/7^\3Q46Q99'HG8M_;"&TV!C?7$Q?0S8,3
M-P9DR&>)&RACO1`EJ#.BRT@C4(RH1G@2T/6='@EWZR6:*ZY@Q].1Y?KEQU@"
MPDI9GNT$_2RLO2E6K?R;8ID_:&X[W1AS,(0Y+;G5&NC&H2`JJUFHLRE+=34+
M"^J0I;:$!5,XVPK?D65K"0MQS&GU8!^&F)'7'X%,UX1*VE[",JN5+(Z,#6N^
MSA)6*>)\CU<L9BAS"Q_^@V-HKLC`[8Z?"S-)TW-A[Z$06IJ(%S7"Z@05%LK>
MXX5%6*H_JI]P_=/:%"N(1"(7O1S-[=O7!,/UL)02J[MP'/G:LHD++J95DYB<
MA0GW#U?H81DL4CPG6(A5Z.8&:(EN&]>#U6]BL"*A84T(PEI@3-VB-4O`>?9U
M;(MFHNHC*_*LWG1S9<M9B1.7CY<X9U*7[7=^MD>S%XO?#R=DUWUB/KQ)UT>V
MLZ07<UIO23>>%VZ?G6#+)*WH9P?0P^XP(GTXPB^QY=VYR.#P:@O=L&COXM\,
MXF/4>;C#K\/#+7Y\EJW=Z8/N`N8%[B*@[ENV#E34[SB)^2F<Y4[:+.6$N#.7
MGL'Q/T#$%[BT8H'773,@`0:B0,O+X?0"_3,#;&Q`97'C\_'\_G3>QI1UE>SR
M"MGE)V279[*776>,I[)"=ND)V969[-(JV=7X'U"6K5_XQZ%]6`N=(8;`P]]*
M^%M=6Q#`%CRV!Y!$R.`_<1[T=+]Q!N$!0_P>7,H0@[F<(1X4+,40O],?D\!N
M],<8V!7[&`.[\$58ZSEKL\6[3_'G?5+93BBAA!)***&$$DHHH8022BBAA!)*
?**&$$DHHH8022BBAA!)***&$$OK9Z7\[X%/U`%``````
`
end

2006-06-23

'Leni Neto' -- Parana, Brazil

'%CUSTOM_FINANCIAL_TERMS@gmail.com'

You may have received some of these in the last couple of months. Some 'custom financial terms' (when they do it right), followed by a random word, separated by underscores.

There are a few harvest IPs. This one's in Brazil, and has the name 'Leni Neto':

$ whois NET-64-15-139-32-1

CustName: Leni Neto
Address: candido de abreu 107
City: Parana
StateProv:
PostalCode: 83280
Country: BR
RegDate: 2006-05-25
Updated: 2006-05-25

There's also

$ whois 209.200.225.93

OrgName: ADDD2NET COM INC DBA LUNARPAGES
OrgID: ACIDL
Address: Add2Net, Inc.
Address: Lunarpages Division
Address: 100 East La Habra Blvd.
City: La Habra
StateProv: CA

$ whois 65.23.156.33

OrgName: Datarealm Internet Services
OrgID: DIS-91
Address: PO Box 1616
City: Hudson
StateProv: WI
PostalCode: 54016
Country: US

and finally

$ whois 72.2.24.104

OrgName: Big Pipe Inc.
OrgID: BGPP
Address: Suite 400
Address: 630 - 3rd Ave. SW
City: Calgary
StateProv: AB
PostalCode: T2P-4L4
Country: CA

I'm guessing the last three are colo for hosting spam content. Anyone in Parana know 'Leni'?

'Digital Infinity Ltd' -- Moscow, RU

A week ago, they harvested my addresses four times on four consecutive days. They then proceeded to attempt to send mail to these ~100 unique addresses around 3000 [update 7/1: 5000] times.

This is a commercial organisation, hosted in the US, registered in Moscow.

The harvesting IPs are all in this small, 14 host netblock:

$ whois NET-208-66-195-0-1

OrgName: Digital Infinity Ltd
OrgID: DIL-32
Address: Ostrovityanova str, 14, 200
City: Moscow
Country: RU

NetRange: 208.66.195.0 - 208.66.195.15
CIDR: 208.66.195.0/28
NetName: DIGITALINFINITY
NetHandle: NET-208-66-195-0-1
Parent: NET-208-66-192-0-1
NetType: Reassigned
RegDate: 2006-05-31
Updated: 2006-05-31

RAbuseHandle: SUPPO189-ARIN
RAbuseName: Support
RAbusePhone: +7 (495) 980-6635
RAbuseEmail: noc@digitalinfinity.org

However, the addresses they're sending from are trojaned home and/or office computers, or their ISP's mail exchangers. I recorded over a thousand unique hosts so far.

And now, data

Preamble complete; now for some data -- or at least, its first derivative; the raw data would compromise my anonymity.

So, I'll have to feed post bits that I deem to be interesting. If you want raw data, make your own once I've made the code available, or convince me via private email that you're not a spammer.

Technical details

A few words follow explaining how the tagged email addresses work. Details have been anonymized.

The tag encodes two 32-bit numbers: Unix epoch time, and the client's ipv4 address. These are concatenated, then encrypted using a standard library. The encryption is not required, but was added due to the initially covert nature of this project. The encrypted binary string is then encoded using base 32. The first implementation used a nearly standard base 64 encoding, but spammers fold case in nearly all instances. Hexadecimal is 80% as efficient as base 32, but as a smaller tag is desirable, a base 32 encoder was written and used.

The resulting tagged addresses look (something) like this:

aa5kfudcr6i71eiofvcifmvrlzk74s5s@example.org

The 'aa' is an arbitrary prefix and encodes no data. I use a number of different prefixes in each page delivered to increase the number of tagged addresses in the spammers database, as well as gleaning additional information on whether spammers will use all addresses they harvest, and in what order if any.

The above email address, if used, will reveal the exact time and location it was originally harvested from; in this case, the IP address of the local machine, and the UTC time at which I loaded the page earlier today:

10.0.1.1
2006-06-23 15:07:51

By combining this with the logs from the MTA daemon, we have both harvest and spam data in the same record:

2006-06-23 15:07:51,aa,10.0.1.1,2006-06-23 15:09:22,10.0.1.1,mailhost.example.org,spam-from@example.org

(that's: time harvested, address prefix, harvest IP, time spammed, spammer IP, spammer HELO, spammer FROM envelope sender)

Beginning

Around a year ago, I had an idea. Well, I have a few, but this one gained some traction in that it actually produced working code. The code was (is) sloppy and unglamorous, but serves its purpose.

Having placed spamtrap addresses on my website for some years to add to the means of stopping spam from reaching my inbox, I wondered if there was more that I could do with a bait email address. Could the address itself encode information about the host which harvested it? It was obvious that it could, the question was whether I could be bothered to build it. I spent a few hours checking whether this was an old idea -- or at least, one which had been made public -- then did wrote the code.

While spammers are careful to conceal their identities when spamming, often they are less so when harvesting email addresses. Spam from botnet, harvest from home.

I'm doing this anonymously in case I upset any spammers to the point where they could be bothered to pay me a visit. However, I still want to get the data out, so I'll use this blog as a convenient place to do so.