Wednesday, January 14, 2015

python google protobuf and optional field with empty repeated inside (or the "has_field = true" analog in python)

So I was searching for something to represent protobufs in a human-readable format. After lots of googling I've found that there is a magic built-in module called text_format, which does just what I need - it converts protobufs to/from human readable format, which looks quite similar to json. It is not a valid json, as json supported types don't match protobufs, and it has a slightly different format. Pb text is fine for reading, but it has a poor amount of tools that support it. For example, if you need some kind of xpath analog to search inside protobufs, you will be disappointed, as there is no such thing freely available(though, on some forums google developers mentioned that they have one, but they can't or don't want to share it). So, I decided to try to convert pb to json.

There are a bunch of not-so-popular pb<->json converters out there but, as it turned out, they all have the same bug related to handling an optional field with an empty repeated field inside. Here is what I mean:
message Bar {
  repeated int32 baz = 1;
}

message Foo {
  optional Bar bar = 1;
}
Even if you have baz containing 0 entries, it is still there, so bar should be present too.

Those pb<->json converters do convert pb to json appropriately, so Foo foo looks like:
{
  "bar" : {}
}
But when converting back, they just miss it, as repeated baz is represented by a python list, so if you have no entries in baz(baz == []) and you assign foo.bar = [] protobuf will think that you didn't set foo.bar at all. So, if you do convertion pb->json->pb->json you will see:
{
}
Which indicates that protobuf just dropped your optional field(that should be set) with empty optional inside.

In C, you have a has_* field, to mark that the field is present, so it is pretty straight forward.
But in python there wasn't such field to set, and a brief looking into pb methods didn't reveal anything appropriate. But after a bit of digging into text_format sources i found a method called SetInParent() that
does the same thing has_* field does in C. So if you do foo.bar.SetInParent(), it will set has_bar field and after pb->json->pb->json covertion you will see:
{
  "bar" : {}
}
Which is correct.