Sunday, June 15, 2014

Troubles with zip

A part of porting itertools, I've had to figure out how various functions have to be limited because Swift just isn't as flexible as Python.

Python's zip takes N iterables, and generates N-tuples. Like this:
    >>> a = [1, 2, 3]
    >>> b = ["spam", "eggs", "spam"]
    >>> list(zip(a, b))
    [(1, "spam"), (2, "eggs"), (3, "spam")]
If you've never used a language with zip, you may not realize how useful it is. If you have, you're probably surprised that it doesn't come built in with Swift. But we can build it ourselves… sort of.

Variadic arguments

Swift—like Python, C++, ObjC, and even C—allows you to take variadic arguments. And at first glance, it seems like they've captured everything you need with a syntax that's as simple as Python's, and maybe even more readable:
    def zip(*sequences) # sequences is a list of all arguments
    func zip(sequences: Sequence...) # sequences is an array of all arguments
So, we're set, right?

Of course not. That just specifies one or more sequences of type Sequence. All of them have to be the same type. This is a direct consequence of the fact that variadic arguments are passed to your code as an Array (just as they're passed to Python code as a list… but of course Python lists are heterogeneous arrays, so that's not a problem). And there's really no other option, given the rest of Swift's design.

Why couldn't they have used a tuple instead of an array? Well, there's no way to iterate or index a tuple in Swift, so that would have been completely useless. And, even if there were, how would you get at the types of the values? Well, C++ has a solution to that: you get a tuple of values, and a type tuple of types.

In fact, even C solves this problem. In a horribly unsafe way, of course (you have to somehow pass the type information out-of-band, and then switch on that to pull out each value correctly). But still…

The code above will actually compile, because I've made the sequences all of type Sequence, but that isn't a useful type. (As explained somewhere in here, you need a generic function or type that can supply the associated types for Sequence if you want to do anything with it.)

Does zip really need to be heterogeneous?

Yes. That's the whole point of zip. You can take an Int[] and a String[] and get back an (Int, String)[].

(Transposing a T[][] is a useful thing to do, and zip happens to work for that in Python. But it's not the main function of zip, and I don't see any problem writing it as a separate function with a different name.)

Building tuples

Given that it's impossible to access a tuple dynamically, it's probably no surprise that it's also impossible to build one dynamically. So, even if you could somehow access a heterogenous tuple of sequences, you couldn't actually return anything useful from the next method.

Any way around the problem?

The way C++ used to handle this kind of thing, before parameter packs, was to use overloads: you define 10 functions with the same name, taking from 0 to 9 parameters, copying and pasting and modifying the code for each one. This is hideous, but between the preprocessor (which Swift doesn't have) and some clever compile-time metaprogramming with templates (which Swift doesn't have) you can hide the hideousness. But, even if you were willing to live with that hideousness, you don't have a choice, because in Swift, functions don't have overloads, only methods.

Another way to handle this in C++ is with default values. In C++, you had to define a special singleton type to use as a sentinel that could be detected via type inference. You could then either switch on that sentinel with if statements, or use partial specialization to get a similar effect to overloads. The good news is that Swift already provides the perfect value, nil, which you can use just by making all of your types optional. The OK news is that if statements are your only option. The bad news is that it won't work anyway. Here's an example:
    func foo(_ t0: T0?=nil, _ t1: T1?=nil, _ t2: T2?=nil) {
        if t0 {
            if t1 {
                if t2 {
                    println("\(t0) \(t1) \(t2)")
                } else {
                    println("\(t0) \(t1)")
            } else {
        } else {
If you try to call this with one, or two arguments, you'll get an error "cannot convert the expression's type '()' to type 'Int'" (or 'StringLiteralConvertible', or whatever the type of the last argument is). That error doesn't make a lot of sense, but it's clearly a type-inference problem, and if you think about it, it makes sense: The compiler has no problem figuring out that T1 is an Int if you pass an Int for t1, but if you leave t2 off (or if you pass an untyped nil literal, which has the same effect), how can it possibly figure out what type T2 is? It's just tried to print the error for the wrong type.

If you try to call it with zero or three arguments, you get a compiler crash instead. Which I assume is the same error handled even worse.

Falling back to dynamic (ObjC) types?

Needless to say, everything is a lot more successful if you try to define things in terms of Sequences of Any (or of AnyObject, or NSArrays, or…), except for the slight problem that you're very unlikely to actually have sequences of Any lying around that you want to zip up, so you still wouldn't be able to call zip on, say, the examples given at the top.

Fill value

What should zip do if given sequences of different lengths? There are three basic possibilities:
    • Stop as soon as any of the sequences stops.
        Keep going until all sequences stop, filling in missing values with some specified fill value.
          Keep going until all sequences stop, generating short tuples. The third one sounds like a good idea at first, but it turns out to be weird and rarely useful—which is good, because it would be very hard to implement in Swift. Python provides the other two as separate functions, zip and zip_longest. For zip_longest, the fill value is an extra keyword parameter after the sequences.

          The problem should be immediately obvious: If the two sequences are of different types, you need two different fill values, right?

          But often, there's a value that will work for both types, possibly with a cast. The obvious case here is nil, which you might even want to be the default.

          In order to specify something like that in a static language, you'd need some way to constrain the value to be convertible to both of the other types. Something like this:
              func zip_longest, U: Convertible>
                  (s0: S0, s1: S1, #fillvalue: U) { … }
          Except that protocols aren't generics, they're simple types with associated types that get matched up separately. So, U has to implement Convertible with associated type T0, and also implement Convertible with associated type T1, which… doesn't work. Unlike Java/C#-style generic interfaces, where being Convertible to T0 and Convertible to T1 are separate types that can be specified separately, Swift's associated-type-based protocols mean they're both the same type, Convertible, with the same associated type, U.ConvertibleTo, which has to be either T0 or T1.

          Could we specify two separate Convertible types? Sure:
              func zip_longest
          But this won't do any good, because fillvalue has to be either U0 or U1, it can't be both.

          I tried adding another type in, U, specifying U: U0 and U: U1, but (a) that leads to a compiler segfault, and (b) I don't think there's any way the compiler could infer the type anyway, meaning you'd have to figure out and type out all that mess every time you wanted to call the function.


          Ultimately, I went with zip with exactly two sequences, zip_longest with exactly two sequences that must be optional (filling with nil), and zip_fill with exactly two sequences plus two fillvalues, not because that's a good design, but because it seems to be the only thing even remotely possible in Swift.
  • No comments:

    Post a Comment