Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refer to indices by cytnx_int64 or cytnx_uint64 #444

Open
manuschneider opened this issue Aug 20, 2024 · 5 comments
Open

Refer to indices by cytnx_int64 or cytnx_uint64 #444

manuschneider opened this issue Aug 20, 2024 · 5 comments

Comments

@manuschneider
Copy link
Collaborator

manuschneider commented Aug 20, 2024

When referring to positive numbers like index or block numbers, the data structure is either a cytnx_int64 or a cytnx_uint64.
For example, UniTensor.permute(std::vector<cytnx_int64>) and UniTensor.at(std::vector<cytnx_uint64>).
For UniTensor.get_block() both versions exist and the difference (if any) is not obvious to me.

This is a bit confusing and inconstant. While it should not cause problems on the Python side, the mix leads to difficulties with C++ because the two types are not automatically converted, at least when combined in vectors.

Is there a reason for different conventions? Could everything be changed to one convention? How can this be done without breaking compatibility with existing code, tests, etc?

@kaihsin
Copy link
Member

kaihsin commented Aug 21, 2024

There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing

@manuschneider
Copy link
Collaborator Author

I see. But perumte does not work with negative indices anyways in the current implementation, at least for BlockUniTensor

@yingjerkao
Copy link
Collaborator

There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing

Can we do it at Pybind11 level to make C++ code consistent?

@IvanaGyro
Copy link
Collaborator

IvanaGyro commented Sep 1, 2024

Here is a crop from the Google C++ style guide that we apply. In short, only use unsigned integers when representing a bit pattern and use size_t and ptrdiff_t when appropriate.

Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.

That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.

@ianmccul
Copy link

ianmccul commented Sep 1, 2024

I suggest use int (or some other signed type) for everything. But note that size_t is unsigned, so don't use that! use ssize_t.

C++20 has std::ssize() function for containers, so it is safe to write eg
Container<T> v; for (auto i = std::ssize(v)-1; i > 0; --i) { ... }

The MPToolkit is currently C++14 and not ready to use C++20 yet (not enough compiler support), but meanwhile I'm just going to add some ssize() function myself.

In new code, my feeling is always use a signed integer, unless it is actually bit-flipping. Having a container using size_type as a signed type will work fine, and if you mix it with standard containers it will harmlessly convert to an unsigned type. But always use std::ssize() when getting the size of containers when it needs to work with std containers as well, and similarly wrap other existing uses of unsigned types.

See also:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf

Bjarne also advocates for making all .size() functions return a signed int: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1491r0.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants