A part of porting itertools, I've had to figure out how various functions have to be limited because Swift just isn't as flexible as Python.

Python's zip takes N iterables, and generates N-tuples. Like this:

    >>> a = [1, 2, 3]
    >>> b = ["spam", "eggs", "spam"]
    >>> list(zip(a, b))
    [(1, "spam"), (2, "eggs"), (3, "spam")]

If you've never used a language with zip, you may not realize how useful it is. If you have, you're probably surprised that it doesn't come built in with Swift. But we can build it ourselves… sort of.

Variadic arguments

Swift—like Python, C++, ObjC, and even C—allows you to take variadic arguments. And at first glance, it seems like they've captured everything you need with a syntax that's as simple as Python's, and maybe even more readable:

    def zip(*sequences) # sequences is a list of all arguments
    func zip(sequences: Sequence...) # sequences is an array of all arguments

So, we're set, right?

Of course not. That just specifies one or more sequences of type Sequence. All of them have to be the same type. This is a direct consequence of the fact that variadic arguments are passed to your code as an Array (just as they're passed to Python code as a list… but of course Python lists are heterogeneous arrays, so that's not a problem). And there's really no other option, given the rest of Swift's design.

Why couldn't they have used a tuple instead of an array? Well, there's no way to iterate or index a tuple in Swift, so that would have been completely useless. And, even if there were, how would you get at the types of the values? Well, C++ has a solution to that: you get a tuple of values, and a type tuple of types.

In fact, even C solves this problem. In a horribly unsafe way, of course (you have to somehow pass the type information out-of-band, and then switch on that to pull out each value correctly). But still…

The code above will actually compile, because I've made the sequences all of type Sequence, but that isn't a useful type. (As explained somewhere in here, you need a generic function or type that can supply the associated types for Sequence if you want to do anything with it.)

Does zip really need to be heterogeneous?

Yes. That's the whole point of zip. You can take an Int[] and a String[] and get back an (Int, String)[].

(Transposing a T[][] is a useful thing to do, and zip happens to work for that in Python. But it's not the main function of zip, and I don't see any problem writing it as a separate function with a different name.)

Building tuples

Given that it's impossible to access a tuple dynamically, it's probably no surprise that it's also impossible to build one dynamically. So, even if you could somehow access a heterogenous tuple of sequences, you couldn't actually return anything useful from the next method.

Any way around the problem?

The way C++ used to handle this kind of thing, before parameter packs, was to use overloads: you define 10 functions with the same name, taking from 0 to 9 parameters, copying and pasting and modifying the code for each one. This is hideous, but between the preprocessor (which Swift doesn't have) and some clever compile-time metaprogramming with templates (which Swift doesn't have) you can hide the hideousness. But, even if you were willing to live with that hideousness, you don't have a choice, because in Swift, functions don't have overloads, only methods.

Another way to handle this in C++ is with default values. In C++, you had to define a special singleton type to use as a sentinel that could be detected via type inference. You could then either switch on that sentinel with if statements, or use partial specialization to get a similar effect to overloads. The good news is that Swift already provides the perfect value, nil, which you can use just by making all of your types optional. The OK news is that if statements are your only option. The bad news is that it won't work anyway. Here's an example:

    func foo(_ t0: T0?=nil, _ t1: T1?=nil, _ t2: T2?=nil) {
        if t0 {
            if t1 {
                if t2 {
                    println("\(t0) \(t1) \(t2)")
                } else {
                    println("\(t0) \(t1)")
                }
            } else {
                println("\(t0)")
            }
        } else {
            println("nothing")
        }
    }

If you try to call this with one, or two arguments, you'll get an error "cannot convert the expression's type '()' to type 'Int'" (or 'StringLiteralConvertible', or whatever the type of the last argument is). That error doesn't make a lot of sense, but it's clearly a type-inference problem, and if you think about it, it makes sense: The compiler has no problem figuring out that T1 is an Int if you pass an Int for t1, but if you leave t2 off (or if you pass an untyped nil literal, which has the same effect), how can it possibly figure out what type T2 is? It's just tried to print the error for the wrong type.

If you try to call it with zero or three arguments, you get a compiler crash instead. Which I assume is the same error handled even worse.

Falling back to dynamic (ObjC) types?

Needless to say, everything is a lot more successful if you try to define things in terms of Sequences of Any (or of AnyObject, or NSArrays, or…), except for the slight problem that you're very unlikely to actually have sequences of Any lying around that you want to zip up, so you still wouldn't be able to call zip on, say, the examples given at the top.

Fill value

What should zip do if given sequences of different lengths? There are three basic possibilities:

    func zip_longest, U: Convertible>
        (s0: S0, s1: S1, #fillvalue: U) { … }

    func zip_longest

Stupid Swift Ideas

Sunday, June 15, 2014

Troubles with zip

Variadic arguments

Does zip really need to be heterogeneous?

Building tuples

Any way around the problem?

Falling back to dynamic (ObjC) types?

Fill value

Conclusions

No comments:

Post a Comment