Go Protobuf 编解码使用

Protobuf

Protobuf是Protocol Buffers的简称，它是Google公司开发的一种数据描述语言，并于2008年对外开源。Protobuf刚开源时的定位类似于XML、JSON等数据描述语言，通过附带工具生成代码并实现将结构化数据序列化的功能。但是我们更关注的是Protobuf作为接口规范的描述语言，可以作为设计安全的跨语言PRC接口的基础工具

为什么选择Protobuf

一般而言我们需要一种编解码工具会参考:

编解码效率
高压缩比
多语言支持

其中压缩与效率最被关注的点:

使用流程

首先需要定义我们的数据，通过编译器，来生成不同语言的代码

首先创建hello.proto文件，其中包装HelloService服务中用到的字符串类型

<comment>

message  <message_name> {
  <filed_rule>  <filed_type> <filed_name> = <field_number> 
        类型         名称             编号  
}

comment: 注射 /* */或者 //
message_name: 同一个pkg内，必须唯一
filed_rule: 可以没有, 常用的有repeated, oneof
filed_type: 数据类型, protobuf定义的数据类型, 生产代码的会映射成对应语言的数据类型
filed_name: 字段名称, 同一个message 内必须唯一
field_number: 字段的编号, 序列化成二进制数据时的字段编号, 同一个message 内必须唯一, 1 ~ 15 使用1个Byte表示, 16 ～ 2047 使用2个Byte表示

安装编译器

protobuf的编译器叫: protoc(protobuf compiler), 我们需要到这里下载编译器: Github Protobuf

压缩包文件包含：

include, 头文件或者库文件
bin, protoc编译器
readme.txt, 一定要看，按照这个来进行安装

安装编译器二进制

linux/unix系统直接:

1	mv bin/protoc usr/bin

windows系统:

1
2
3

注意: Windows 上的 git-bash 上默认的 /usr/bin 目录在：C:\Program Files\Git\usr\bin\

因此我们首先将bin下的 protoc 编译器 挪到C:\Program Files\Git\usr\bin\

安装编译器库

include 下的库文件需要安装到: /usr/local/include/

linux/unix系统直接:

1	mv include/google /usr/local/include

windows系统:

1	C:\Program Files\Git\usr\lib\include

验证安装

1 2	$ protoc --version libprotoc 3.19.1

安装Go语言插件

Protobuf核心的工具集是C++语言开发的，在官方的protoc编译器中并不支持Go语言。要想基于上面的hello.proto文件生成相应的Go代码，需要安装相应的插件

1	go install google.golang.org/protobuf/cmd/protoc-gen-go@latest

接下来我们就可以使用protoc 来生成我们对应的Go语言的数据结构

定义Protobuf消息类型

编写.proto包文件

消息类型定义在.proto包中，我们这里创建个person.proto 文件，将以下的经典的示例写法写入：

syntax = "proto3";

option go_package = "/person";

package example;

message Person {
    string name = 1;
    int32 age = 2;
    repeated string hobbies = 3;
}

syntax: 表示采用proto3的语法。第三版的Protobuf对语言进行了提炼简化，所有成员均采用类似Go语言中的零值初始化（不再支持自定义默认值），因此消息成员也不再需要支持required特性。
package example：指定消息类型所在的包名，这里包名是 example`。
option go_package = “/person”;用于指定生成的 Go 代码的包名（package name）及导入路径（import path）。
message Person { … }：定义一个名为 Person 的消息类型。
string name = 1：定义一个名为 name 的字符串类型字段，该字段的标签号为 1。int32 age = 2：定义一个名为 age 的整型字段，该字段的标签号为 2。
repeated string hobbies = 3：定义一个名为 hobbies 的字符串数组类型字段，该字段的标签号为 3。repeated 关键字表示该字段是一个数组类型。

这个.proto文件中定义了一个名为 Person 的消息类型，包含了 name、age 和 hobbies 三个字段。name 和 age 都是普通的单值类型字段，hobbies 是一个字符串数组类型字段。在这个文件中，每个字段都有一个唯一的标签号，用于标识这个字段在二进制编码中的位置和类型

使用protoc生成GO代码

在此文件的目录下，运行以下命令，即可生成GO代码。

1	protoc --go_out=. *.proto

运行后，我们可以看到该目录下多出了一个person文件夹，里面包含 Go 文件 person.pb.go。这个文件内部定义了一个结构体 Person，以及相关的方法:

type Person struct {
   state         protoimpl.MessageState
   sizeCache     protoimpl.SizeCache
   unknownFields protoimpl.UnknownFields

   Name    string   `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
   Age     int32    `protobuf:"varint,2,opt,name=age,proto3" json:"age,omitempty"`
   Hobbies []string `protobuf:"bytes,3,rep,name=hobbies,proto3" json:"hobbies,omitempty"`
}

除了结构体外，还有很多方法，这些方法提供了对 Protocol Buffers 消息进行编码、解码和操作的基础设施，有以下几个主要的方法。

func (*Person) Reset(): 将 Person 消息重置为默认值。
func (*Person) String() string: 返回一个字符串，包含 Person 消息的文本表示形式。
func (*Person) ProtoMessage(): 使 Person 结构体实现 proto.Message 接口，这是在序列化和反序列化 Protobuf 消息时所需的。
func (*Person) Descriptor() ([]byte, []int): 返回关于 Person 消息类型的描述符信息。
func (*Person) GetName() string: 返回 Person 消息中 Name 字段的值。
func (*Person) GetAge() int32: 返回 Person 消息中 Age 字段的值。

写一个简单的测试

func main() {
	p := &Person.Person{Name: "yzy", Age: 23, Hobbies: []string{"music", "sport"}}
	fmt.Println("string", p.String())
	fmt.Println("the data:", p.Name, p.Age, p.Hobbies)

	// fmt.Println("-----------")
	fmt.Println("reset the person")
	fmt.Println("-----------")

	p.Reset()
	fmt.Println("string", p.String())
	fmt.Println("the data:", p.Name, p.Age, p.Hobbies)
	// 运行后可以看到，get、string、reset都正常运行。
}

序列化和反序列化消息

序列化和反序列化函数在github.com/golang/protobuf/proto包中，这个包刚刚我们已经通过go get获取过了，所以可以直接使用，以下是一个序列化和非序列化的使用示例，并且比较了序列化前和经过序列化后的数据是否一致。

func TestPersonSerialization(t *testing.T) {
	// 创建一个 Person 消息实例并设置其字段
	p := &Person{Name: "yzy", Age: 23, Hobbies: []string{"music", "sport"}}

	// 将消息序列化为二进制格式
	data, err := proto.Marshal(p)
	if err != nil {
		t.Fatal("marshaling error: ", err)
	}

	// 反序列化消息
	p2 := &Person{}
	err = proto.Unmarshal(data, p2)
	if err != nil {
		t.Fatal("unmarshaling error: ", err)
	}

	// 比较原始消息和反序列化后的消息
	if p.String() != p2.String() {
		t.Fatalf("original message %v != unmarshaled message %v", p, p2)
	} else {
		fmt.Println("数据一致")
	}
}

字段类型

字段类型部分参考Go Protobuf 简明教程 .

标量类型(Scalar)

proto类型	go类型	备注
double	float64
float	float32
int32	int32
int64	int64
uint32	uint32
uint64	uint64
sint32	int32	适合负数
sint64	int64	适合负数
fixed32	uint32	固长编码，适合大于2^28的值
fixed64	uint64	固长编码，适合大于2^56的值
sfixed32	int32	固长编码
sfixed64	int64	固长编码
bool	bool
string	string	UTF8 编码，长度不超过 2^32
bytes	[]byte	任意字节序列，长度不超过 2^32

标量类型如果没有被赋值，则不会被序列化，解析时，会赋予默认值。

strings：空字符串
bytes：空序列
bools：false
数值类型：0

枚举(Enumerations)

枚举类型适用于提供一组预定义的值，选择其中一个。例如我们将性别定义为枚举类型。

message Student {
  string name = 1;
  enum Gender {
    FEMALE = 0;
    MALE = 1;
  }
  Gender gender = 2;
  repeated int32 scores = 3;
}

枚举类型的第一个选项的标识符必须是0，这也是枚举类型的默认值。
别名（Alias），允许为不同的枚举值赋予相同的标识符，称之为别名，需要打开allow_alias选项。
预留值，枚举也支持reserved 预留值

message EnumAllowAlias {
  enum Status {
    option allow_alias = true;
    UNKOWN = 0;
    STARTED = 1;
    RUNNING = 1;
  }
  enum Foo {
    A = 0;
    B = 1;
    reserved 2, 15, 9 to 11, 40 to max;
    reserved "FOO", "BAR";
  }
}

数组类型

如果我们想声明: []string,[]Item 这在数组类型怎么办? filed_rule: repeated 可以胜任

message SearchResponse {
  repeated Result results = 1;
}

// 会编译为:
// type SearchResponse SearchResponse {
//    results []*Result
// }

使用其他消息类

Result是另一个消息类型，在 SearchReponse 作为一个消息字段类型使用。

message SearchResponse {
  repeated Result results = 1; 
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

嵌套写也是支持的：

message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}

如果定义在其他文件中，可以导入其他消息类型来使用：

1	import "myproject/other_protos.proto";

任意类型(Any)

当我们无法明确定义数据类型的时候，可以使用Any表示

import "google/protobuf/any.proto";

message ErrorStatus {
  string message = 1;
  repeated google.protobuf.Any details = 2;
}

any本质上就是一个bytes数据结构

type ErrorStatus struct {
	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields

	Message string       `protobuf:"bytes,1,opt,name=message,proto3" json:"message,omitempty"`
	Details []*anypb.Any `protobuf:"bytes,2,rep,name=details,proto3" json:"details,omitempty"`
}

oneof

很像范型比如 test_oneof 字段的类型必须是 string name 和 SubMessage sub_message 其中之一:

message Sub1 {
    string name = 1;
}

message Sub2 {
    string name = 1;
}

message SampleMessage {
    oneof test_oneof {
        Sub1 sub1 = 1;
        Sub2 sub2 = 2;
    }
}
protoc -I=. --go_out=./pb --go_opt=module="/pb" pb/oneof.proto

编译过后结构体

type SampleMessage struct {
	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields

	// Types that are assignable to TestOneof:
	//	*SampleMessage_Sub1
	//	*SampleMessage_Sub2
	TestOneof isSampleMessage_TestOneof `protobuf_oneof:"test_oneof"`
}

那我们如何使用喃:

1
2
3

of := &pb.SampleMessage{}
of.GetSub1()
of.GetSub2()

map

如果我们想声明一个map, 可以如下进行

1 2	map<string, Project> projects = 3; // projects map[string, Project]

protobuf 声明map的语法:

1	map<key_type, value_type> map_field = N;

1
2
3

message MapRequest {
  map<string, int32> points = 1;
}

类型嵌套

我们可以再message里面嵌套message

message Outer {                  // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }
}

与Go结构体嵌套一样, 但是不允许匿名嵌套, 必须指定字段名称

定义服务(Services)

如果消息类型是用来远程通信的(Remote Procedure Call, RPC)，可以在 .proto 文件中定义 RPC 服务接口。例如我们定义了一个名为 SearchService 的 RPC 服务，提供了 Search 接口，入参是 SearchRequest 类型，返回类型是 SearchResponse

1
2
3

service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}

官方仓库也提供了一个插件列表，帮助开发基于 Protocol Buffer 的 RPC 服务。

protoc 其他参数

命令行使用方法

1	protoc --proto_path=IMPORT_PATH --<lang>_out=DST_DIR path/to/file.proto

--proto_path=IMPORT_PATH：可以在 .proto 文件中 import 其他的 .proto 文件，proto_path 即用来指定其他 .proto 文件的查找目录。如果没有引入其他的 .proto 文件，该参数可以省略。
--<lang>_out=DST_DIR：指定生成代码的目标文件夹，例如 –go_out=. 即生成 GO 代码在当前文件夹，另外支持 cpp/java/python/ruby/objc/csharp/php 等语言

更多请参考 Updating A Message Type