Tuesday, April 7, 2015

CRIU as a debug tool and replacement for google coredumper

I'm currently working on a criu images -> core dump conversion for CRIT(CRiu Image Tool) and while looking for some info on manually generating core dump, I've found an interesting yet outdated(OMG last changed in 2007!) project called google coredumper[1] that allows generating core dumps whenever you want to, which looks like a cool thing thing for a debug. CRIU is able to dump process at any point too, yet providing a lot more info about process state, because it can be literally fully restored from images, so I thought that coredump users(if there are any today) could use CRIU for their purposes. Though, criu images -> core dump conversion looks like a complete waste of data, so I have another thought on somehow integrating criu images into gdb, so it could ask criu to restore process with --leave-stopped and then attach to the process for debug. It also may be a good thing to be able to save state of the task that is being debugged by detaching and calling criu dump.

[1] https://code.google.com/p/google-coredumper/

Wednesday, January 14, 2015

python google protobuf and optional field with empty repeated inside (or the "has_field = true" analog in python)

So I was searching for something to represent protobufs in a human-readable format. After lots of googling I've found that there is a magic built-in module called text_format, which does just what I need - it converts protobufs to/from human readable format, which looks quite similar to json. It is not a valid json, as json supported types don't match protobufs, and it has a slightly different format. Pb text is fine for reading, but it has a poor amount of tools that support it. For example, if you need some kind of xpath analog to search inside protobufs, you will be disappointed, as there is no such thing freely available(though, on some forums google developers mentioned that they have one, but they can't or don't want to share it). So, I decided to try to convert pb to json.

There are a bunch of not-so-popular pb<->json converters out there but, as it turned out, they all have the same bug related to handling an optional field with an empty repeated field inside. Here is what I mean:

message Bar {
    repeated int32 baz = 1;
}

message Foo {
    optional Bar bar = 1;
}

Even if you have baz containing 0 entries, it is still there, so bar should be present too.

Those pb<->json converters do convert pb to json appropriately, so Foo foo looks like:

{
    "bar" : {}
}

But when converting back, they just miss it, as repeated baz is represented by a python list, so if you have no entries in baz(baz == []) and you assign foo.bar = [] protobuf will think that you didn't set foo.bar at all. So, if you do convertion pb->json->pb->json you will see:

{
}

Which indicates that protobuf just dropped your optional field(that should be set) with empty optional inside.

In C, you have a has_* field, to mark that the field is present, so it is pretty straight forward.
But in python there wasn't such field to set, and a brief looking into pb methods didn't reveal anything appropriate. But after a bit of digging into text_format sources i found a method called SetInParent() that
does the same thing has_* field does in C. So if you do foo.bar.SetInParent(), it will set has_bar field and after pb->json->pb->json covertion you will see:

{
    "bar" : {}
}

Which is correct.