淘宝直播长连接弹幕分析(无乱码)

淘宝的powermsg真是个无比神奇的东西,因为现在已经过时了,特地拿出来说道一下:

  • powermsg的相关接口最早是可以直接拿到pv、uv、甚至当前在线人数,这玩意没记错的话在19年下半年被封了
  • 上边的powermsg接口被封了之后仍然可以通过websocket或nativemsg拿到上边的数据,只是没有那么方便,这玩意在20年下半年被封了
  • 淘宝的封是简单粗暴的把和淘宝直播相关的topic数据改成固定值或者随机数,所以对分析协议没有影响
  • 目前淘宝直播的powermsg(可以说web版)已经下线,可以去同源的1688直播看看,以下内容也是基于1688直播

拿到数据

nativemsg

简单方便,直接调用接口,有topic就行。拿到的data如下

timestampList->data是具体的数据,需要去除所有空格,无压缩就是单纯的base64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"configversion": "0",
"offset": "1655976924044",
"period": "10",
"role": "0",
"timestampList": [
{
"data": "MAkhCAFYAAokZWVlNmMxODktMGYwOC00ODE4LWJiNGQtZDMwMDU1Yjg3ZWIwMiBhZWI1YWRmODZi\nY2E4N2EyY2NiYTQzNjBlZGNmMmQ5MzjA6gFCCjEwOTQxMDMxNDIcABIP5aaC5L2V6YCJ5oupNTIw\nIJjdqf+YMDICdGJlAHsiYnV5ZXJMZXZlbCI6IkwwIiwiYnV5ZXJMZXZlbFBpYyI6Imh0dHBzOi8v\nZ3cuYWxpY2RuLmNvbS90ZnMvVEIxU1lmaG1jaWViMThqU1pGdlhYYUkzRlhhLTI2LTI2LnBuZyJ9\n",
"msgId2DataId": {
"aeb5adf86bca87a2ccba4360edcf2d93": "4cPKVSo0BHbvTiro"
},
"offset": "1655976915024"
},
{
"data": "NAkhCAGOAAokZWVlNmMxODktMGYwOC00ODE4LWJiNGQtZDMwMDU1Yjg3ZWIwMhAza2VlQ3JMMEJ2\nMGtpZlhpOOHUA0IKMTA5NDEwMzE0MmopCgd0cmFjZUlkEh4yMTNlZjAxZjE2NTU5NzY5MTYzMTQ3\nNDgxZWI1NjRqGQoIc2VuZFRpbWUSDTE2NTU5NzY5MTY0ODELACCA7Kn/mDAyAnRiowB7ImFjdGlv\nbk5hbWUiOiJGT0xMT1dfU1lTVEVNIiwiYWN0aW9uVHlwZSI6IkZBTlNfREFJTFlfVEFTS19SRUZS\nRVNIIiwiY29udGVudCI6eyJzdHJlYW1lclVzZXJJZCI6MjIwODkzODc1MTgyMSwidXNlcklkIjox\nMDk0MTAzMTQyfSwicDJwIjp0cnVlLCJ1c2VySWQiOjEwOTQxMDMxNDJ9MAkhCAGBAAokZWVlNmMx\nODktMGYwOC00ODE4LWJiNGQtZDMwMDU1Yjg3ZWIwMhAza2VlQVcxMEJGcFZNM1A1OJNOaikKB3Ry\nYWNlSWQSHjIxMmM4OWFhMTY1NTk3NjkxMDYwNzYxOTRlNzIzMWoZCghzZW5kVGltZRINMTY1NTk3\nNjkxMDY3NgsAINS+qf+YMDICdGICARIa5LiK6KGjIFhYIOS4jemAgOS4jeaNoiAxMTYaSGh0dHBz\nOi8vY2J1MDEuYWxpY2RuLmNvbS9pbWcvaWJhbmsvTzFDTjAxWXhEcmRoMUJzMmxRM21mU1ZfISEw\nLTAtY2liLmpwZyJ2aHR0cDovL2RldGFpbC5tLjE2ODguY29tL3BhZ2UvaW5kZXguaHRtbD9vZmZl\ncklkPTY3NzIwMzU3MzEzMiZ3YmNMaXZlVHlwZT1saXZlX29mZmVyX3NlbmQmZmVlZElkPTM2NTU1\nMjM1NzExMCZzdGF0dXM9MioNbGl2ZVZpZGVvSXRlbTIMNjc3MjAzNTczMTMyOgUxOS45OQ==",
"msgId2DataId": {
"3keeCrL0Bv0kifXi": "4cPKWoJ0BHbvTisf",
"3keeAW10BFpVM3P5": "4cPKWoJ0BHbvTisf"
},
"offset": "1655976917029"
}
],
"traceMsgIds": {}
}

websocket

websocket的建立比较麻烦,不是本文的重点,onmessage拿到如下数据

data就是需要的数据,这里需要根据compressType来进行解压

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"protocol": "ACCS_H5",
"type": "DATA",
"compressType": "GZIP",
"err": "0",
"serviceId": "powermsg",
"dataId": "4cPMQxz0BvrFGgQv",
"data": "H4sIAAAAAAAAAI3PvU/bQBgG8NAOpC6oUtSFpQpWFySc3J3vM1sTioSSQitCFHWp7Ltzvhwnis+ED2VlY2dAYoGBoRMLa4eKrp27VWTpPwEObdWOSO/yPnr10/PiZ8vZueOM9Zpz5btIeI7LfOZgpKjjaw0dxQQHwBM0gBK9cHu6NX7jgnLS2ITvPvDvR0/LlusKxDgGCHRXrHkz8qTeULlXCCJJsUchJUQw5jKAKcOUck0CDbpLVjbWkap3+jq3+O+GcPY8k/9x8/XuBKAnxn+ZObTNwHhhZZBExi5BjAmbgFntxmNr54UmElMkBIUYAK6B0CiQAQwEYFQIxb/8mitnIRQCEy6WMrmF6enZ7fXx9OJq+vkyf/7tb5ufaZtQ7+pwIwoGdunQ9pN9ParNErtk11x79b/kfUemYduYYVwqFlvjghd2pIoKctAvmiAu1stQtddG0dbw7acEd7c/Vv1m06/sNJqeg2g6hWHUSsVOvO5FcUqZUaLTPXwkLvd7B/3NNdiqghRfj1M8VDMcEuAQ8Ed/0Op6z/x+YDK5B/DQbjsTAgAA",
"source": "accs-mass",
"target": "powermsg",
"ip": "33.8.181.220",
"extHeader": {
"0": "ZmFsc2U=",
"4": "bWFzcy1yZXNldEBANGNQTVF4dzBCS3BST3RsdUBAcG93ZXJtc2dAODhkYjMyOWEtMzdiNy00MmQ2LWJlZTEtZDc5ODAwYTk2ZjFjX3RiQEBmYWxzZUBAZmFsc2VAQDFAQDE2NTU5NzczNzExNjZAQFBNMUBAZmFsc2U=",
"9": "c2g=",
"30": "MjEwOGI1ZGMxNjU1OTc3MzcxMTcwMzQ2NWQwNTdm"
}
}

简单逆向

上面预处理之后的数据应该是一个byte数组,那么怎么将它变成可视的数据呢

通过简单逆向找到了powermsg对这个byte数组的处理,代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
var t = new o.default(e.length);
t.append(e.buffer, "binary", 0);
var n = {}
, r = 0
, i = t.view.byteLength
, f = t.readUint8(r);
r++,
n.msgType = f >> 4,
n.DUP = f >> 3 & 1,
n.DUP = f >> 1 & 3,
n.retain = 1 & f;
var u = t.readUint8(r);
r++,
n.version = u >> 3,
n.serializeType = 7 & u;
var l = t.readUint8(r);
r++,
n.sysCode = l >> 5,
n.type = 31 & l;
var c = t.readUint8(r);
r++,
n.typeVersion = c >> 3;
var h = 7 & c
, d = t.readUint8(r);
r++,
n.bizCode = h * s + d;
var p = new Uint16Array(new Uint8Array([t.readUInt8(r), t.readUInt8(r + 1)]).buffer)[0];
r += 2;
var g = new Uint8Array([].slice.call(t.view, r, r + p)).buffer;

r += p,
n.msgHeaderBuffer = g;
var y = new Uint16Array(new Uint8Array([t.readUInt8(r), t.readUInt8(r + 1)]).buffer)[0];
r += 2;
var m = new Uint8Array([].slice.call(t.view, r, r + y)).buffer;
r += y,
n.msgBodyBuffer = m;
var v = new Uint16Array(new Uint8Array([t.readUInt8(r), t.readUInt8(r + 1)]).buffer)[0];
r += 2;
var b = new Uint8Array([].slice.call(t.view, r, r + v)).buffer;
if (r += v,
n.msgDataBuffer = b,
a.push(n),
r < i)
return this.decode(new Uint8Array([].slice.call(e, r)));
var E = a;
return a = [],
E

简单易懂,有手就能还原,这里总结下,不给出具体代码

  1. 不用在意t,把它当成预处理后的byte数组即可
  2. r就是t的索引,顺序读,读出一个byte的地方没有什么太大的意义
  3. 读取两个byte的地方和之后的地方是实际需要的数据
  4. uint16那里的处理等同于左移8位相加,就是buffer长度使用了2个byte定义
  5. 唯一的if是在说:我这里不一定只有一个data块;一个msg可以解析出多条数据

更进一步

上面每条msg解析出来的多个数据上都能拿到3个buffer

  1. msgHeaderBuffer: 没有太大意义
  2. msgBodyBuffer: 有时候会有用
  3. msgDataBuffer: 意义重大

msgDataBuffer

这里就直接是多行文本或者json字符串,直接视为string即可

msgHeaderBuffer和msgBodyBuffer

这两个虽然不常用,但也说一下,使用protobuf的Any解析后自行还原即可,非常简单

示例

上面websocket里的data解析后如下,有两条数据,我这里并没有写proto文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
--------------------------------header buffer start---------------------------------
type_url: "88db329a-37b7-42d6-bee1-d79800a96f1c"
6: "3kegwA30BuVN1MQ"
7: 50001
8: "3392784020"
13: {
1: "traceId"
2: "212c64a616559773704674668e5fe0"
}
13: {
1: "sendTime"
2: {
6: 53
6: 0x3733373739353536
7: 55
}
}

---------------------------------body buffer start----------------------------------
4: 1655977370587
6: "tb"

---------------------------------data buffer start----------------------------------
{"totalCount":14457}
------------------------------------------------------------------------------------


--------------------------------header buffer start---------------------------------
type_url: "88db329a-37b7-42d6-bee1-d79800a96f1c"
6: "9e5c46299614008e09e2fcf1f907699d"
7: 30017
8: "11994589"

---------------------------------body buffer start----------------------------------
value: "\345\234\237\344\270\215\345\245\263\345\255\251"
4: 1655977370660
6: "tb"

---------------------------------data buffer start----------------------------------
{"levelInfo":{"buyerLevel":"L3","buyerLevelPic":"https://gw.alicdn.com/tfs/TB1dhDrnOpE_u4jSZKbXXbCUVXa-26-26.png","isFans":"true","levelPic":"https://gw.alicdn.com/tfs/TB1cykzmND1gK0jSZFsXXbldVXa-150-50.png","levelText":"L3"}}
------------------------------------------------------------------------------------

淘宝直播长连接弹幕分析(无乱码)
https://back.pub/post/spirder-taobao-websocket-danmu/
作者
Dash
发布于
2022年6月23日
许可协议