ak.cartesian
Defined in awkward.operations.structure on line 3114.
- ak.cartesian(arrays, axis=1, nested=None, parameters=None, with_name=None, highlevel=True, behavior=None)
- Parameters:
arrays (dict or iterable of arrays) – Arrays on which to compute the Cartesian product.
axis (int) – The dimension at which this operation is applied. The outermost dimension is
0, followed by1, etc., and negative values count backward from the innermost:-1is the innermost dimension,-2is the next level up, etc.nested (None, True, False, or iterable of str or int) – If None or False, all combinations of elements from the
arraysare produced at the same level of nesting; if True, they are grouped in nested lists by combinations that share a common item from each of thearrays; if an iterable of str or int, group common items for a chosen set of keys from thearraydict or integer slots of thearrayiterable.parameters (None or dict) – Parameters for the new
ak.layout.RecordArraynode that is created by this operation.with_name (None or str) – Assigns a
"__record__"name to the newak.layout.RecordArraynode that is created by this operation (overridingparameters, if necessary).highlevel (bool) – If True, return an
ak.Array; otherwise, return a low-levelak.layout.Contentsubclass.behavior (None or dict) – Custom
ak.behaviorfor the output array, if high-level.
Computes a Cartesian product (i.e. cross product) of data from a set of
arrays. This operation creates records (if arrays is a dict) or tuples
(if arrays is another kind of iterable) that hold the combinations
of elements, and it can introduce new levels of nesting.
As a simple example with axis=0, the Cartesian product of
>>> one = ak.Array([1, 2, 3])
>>> two = ak.Array(["a", "b"])
is
>>> ak.to_list(ak.cartesian([one, two], axis=0))
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]
With nesting, a new level of nested lists is created to group combinations
that share the same element from one into the same list.
>>> ak.to_list(ak.cartesian([one, two], axis=0, nested=True))
[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, 'b')], [(3, 'a'), (3, 'b')]]
The primary purpose of this function, however, is to compute a different
Cartesian product for each element of an array: in other words, axis=1.
The following arrays each have four elements.
>>> one = ak.Array([[1, 2, 3], [], [4, 5], [6]])
>>> two = ak.Array([["a", "b"], ["c"], ["d"], ["e", "f"]])
The default axis=1 produces 6 pairs from the Cartesian product of
[1, 2, 3] and ["a", "b"], 0 pairs from [] and ["c"], 1 pair from
[4, 5] and ["d"], and 1 pair from [6] and ["e", "f"].
>>> ak.to_list(ak.cartesian([one, two]))
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')],
[],
[(4, 'd'), (5, 'd')],
[(6, 'e'), (6, 'f')]]
The nesting depth is the same as the original arrays; with nested=True,
the nesting depth is increased by 1 and tuples are grouped by their
first element.
>>> ak.to_list(ak.cartesian([one, two], nested=True))
[[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, 'b')], [(3, 'a'), (3, 'b')]],
[],
[[(4, 'd')], [(5, 'd')]],
[[(6, 'e'), (6, 'f')]]]
These tuples are ak.layout.RecordArray nodes with unnamed fields. To
name the fields, we can pass one and two in a dict, rather than a list.
>>> ak.to_list(ak.cartesian({"x": one, "y": two}))
[
[{'x': 1, 'y': 'a'},
{'x': 1, 'y': 'b'},
{'x': 2, 'y': 'a'},
{'x': 2, 'y': 'b'},
{'x': 3, 'y': 'a'},
{'x': 3, 'y': 'b'}],
[],
[{'x': 4, 'y': 'd'},
{'x': 5, 'y': 'd'}],
[{'x': 6, 'y': 'e'},
{'x': 6, 'y': 'f'}]
]
With more than two elements in the Cartesian product, nested can specify
which are grouped and which are not. For example,
>>> one = ak.Array([1, 2, 3, 4])
>>> two = ak.Array([1.1, 2.2, 3.3])
>>> three = ak.Array(["a", "b"])
can be left entirely ungrouped:
>>> ak.to_list(ak.cartesian([one, two, three], axis=0))
[
(1, 1.1, 'a'),
(1, 1.1, 'b'),
(1, 2.2, 'a'),
(1, 2.2, 'b'),
(1, 3.3, 'a'),
(1, 3.3, 'b'),
(2, 1.1, 'a'),
(2, 1.1, 'b'),
(2, 2.2, 'a'),
(2, 2.2, 'b'),
(2, 3.3, 'a'),
(2, 3.3, 'b'),
(3, 1.1, 'a'),
(3, 1.1, 'b'),
(3, 2.2, 'a'),
(3, 2.2, 'b'),
(3, 3.3, 'a'),
(3, 3.3, 'b'),
(4, 1.1, 'a'),
(4, 1.1, 'b'),
(4, 2.2, 'a'),
(4, 2.2, 'b'),
(4, 3.3, 'a'),
(4, 3.3, 'b')
]
can be grouped by one (adding 1 more dimension):
>>> ak.to_list(ak.cartesian([one, two, three], axis=0, nested=[0]))
[
[(1, 1.1, 'a'), (1, 1.1, 'b'), (1, 2.2, 'a')],
[(1, 2.2, 'b'), (1, 3.3, 'a'), (1, 3.3, 'b')],
[(2, 1.1, 'a'), (2, 1.1, 'b'), (2, 2.2, 'a')],
[(2, 2.2, 'b'), (2, 3.3, 'a'), (2, 3.3, 'b')],
[(3, 1.1, 'a'), (3, 1.1, 'b'), (3, 2.2, 'a')],
[(3, 2.2, 'b'), (3, 3.3, 'a'), (3, 3.3, 'b')],
[(4, 1.1, 'a'), (4, 1.1, 'b'), (4, 2.2, 'a')],
[(4, 2.2, 'b'), (4, 3.3, 'a'), (4, 3.3, 'b')]
]
can be grouped by one and two (adding 2 more dimensions):
>>> ak.to_list(ak.cartesian([one, two, three], axis=0, nested=[0, 1]))
[
[
[(1, 1.1, 'a'), (1, 1.1, 'b')],
[(1, 2.2, 'a'), (1, 2.2, 'b')],
[(1, 3.3, 'a'), (1, 3.3, 'b')]
],
[
[(2, 1.1, 'a'), (2, 1.1, 'b')],
[(2, 2.2, 'a'), (2, 2.2, 'b')],
[(2, 3.3, 'a'), (2, 3.3, 'b')]
],
[
[(3, 1.1, 'a'), (3, 1.1, 'b')],
[(3, 2.2, 'a'), (3, 2.2, 'b')],
[(3, 3.3, 'a'), (3, 3.3, 'b')]],
[
[(4, 1.1, 'a'), (4, 1.1, 'b')],
[(4, 2.2, 'a'), (4, 2.2, 'b')],
[(4, 3.3, 'a'), (4, 3.3, 'b')]]
]
or grouped by unique one-two pairs (adding 1 more dimension):
>>> ak.to_list(ak.cartesian([one, two, three], axis=0, nested=[1]))
[
[(1, 1.1, 'a'), (1, 1.1, 'b')],
[(1, 2.2, 'a'), (1, 2.2, 'b')],
[(1, 3.3, 'a'), (1, 3.3, 'b')],
[(2, 1.1, 'a'), (2, 1.1, 'b')],
[(2, 2.2, 'a'), (2, 2.2, 'b')],
[(2, 3.3, 'a'), (2, 3.3, 'b')],
[(3, 1.1, 'a'), (3, 1.1, 'b')],
[(3, 2.2, 'a'), (3, 2.2, 'b')],
[(3, 3.3, 'a'), (3, 3.3, 'b')],
[(4, 1.1, 'a'), (4, 1.1, 'b')],
[(4, 2.2, 'a'), (4, 2.2, 'b')],
[(4, 3.3, 'a'), (4, 3.3, 'b')]
]
The order of the output is fixed: it is always lexicographical in the order
that the arrays are written. Thus, it is not possible to group by three
in the example above.
To emulate an SQL or Pandas “group by” operation, put the keys that you
wish to group by first and use nested=[0] or nested=[n] to group by
unique n-tuples. If necessary, record keys can later be reordered with a
list of strings in ak.Array.__getitem__.
To get list index positions in the tuples/records, rather than data from
the original arrays, use ak.argcartesian instead of ak.cartesian. The
ak.argcartesian form can be particularly useful as nested indexing in
ak.Array.__getitem__.