-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates #134016
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: Alex Voicu (AlexVlx) ChangesThis change adds two semi-magical builtins for AMDGPU:
Neither of these are The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics. Patch is 59.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134016.diff 17 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+ // When used as the predicate for a control structure
+ bool __builtin_amdgcn_processor_is(const char*);
+ bool __builtin_amdgcn_is_invocable(builtin_name);
+ // Otherwise
+ void __builtin_amdgcn_processor_is(const char*);
+ void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+ if (__builtin_amdgcn_processor_is("gfx1201") ||
+ __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (!__builtin_amdgcn_processor_is("gfx906"))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_processor_is("gfx1010") ||
+ __builtin_amdgcn_processor_is("gfx1101"))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+ do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+ for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+ if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ do {
+ *p -= x;
+ } while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+ for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+ if (...)
+ while (...)
+ do { } while (...)
+ for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+ be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+ by the current target; the argument MUST be either a generic or AMDGPU
+ specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+ void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+ if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+ else if (__builtin_amdgcn_processor_is(str)) __builtin_trap();
+
+ bool a = __builtin_amdgcn_processor_is("gfx906");
+ const bool b = !__builtin_amdgcn_processor_is("gfx906");
+ const bool c = !__builtin_amdgcn_processor_is("gfx906");
+ bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto f =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto g =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ __builtin_amdgcn_processor_is("gfx1201")
+ ? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42);
+ if (pfn(__builtin_amdgcn_processor_is("gfx1200")))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return;
+ else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap();
+ }
+
+When invoked while compiling for a concrete target, the builtins are evaluated
+early by Clang, and never produce any CodeGen effects / have no observable
+side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v,
+which is an abstract target, a series of predicate values are implicitly
+created. These predicates get resolved when finalizing the compilation process
+for a concrete target, and shall reflect the latter's identity and features.
+Thus, it is possible to author high-level code, in e.g. HIP, that is target
+adaptive in a dynamic fashion, contrary to macro based mechanisms.
ARM/AArch64 Language Extensions
-------------------------------
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 44ef404aee72f..5d01a7e75f7e7 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+// These are special FE only builtins intended for forwarding the requirements
+// to the ME.
+BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu")
+BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu")
+
//===----------------------------------------------------------------------===//
// R600-NI only builtins.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5e45482584946..45f0f9eb88e55 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine
// AMDGCN builtins diagnostics
def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+ "literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+ "AMDGCN processor identifier; '%0' is not valid">;
+def err_amdgcn_is_invocable_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_is_invocable must be either a "
+ "target agnostic builtin or an AMDGCN target specific builtin; `%0`"
+ " is not valid">;
} // end of sema component.
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index 5b5f47f9647a2..eb43d9b0be283 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
Float128Format = DoubleFormat;
}
}
+
+bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const {
+ return AMDGPUTI.isValidCPUName(CPU);
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 78505d66d6f2f..7aa13cbeb89fd 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final
}
bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+
+ // This is only needed for validating arguments passed to
+ // __builtin_amdgcn_processor_is
+ bool isValidCPUName(StringRef Name) const override;
};
} // namespace targets
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index b56b739094ff3..7b1a3815144b4 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
+ auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());
+
+ auto P = cast<GlobalVariable>(
+ CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy));
+ P->setConstant(true);
+ P->setExternallyInitialized(true);
+
+ return CGF.Builder.CreateLoad(RawAddress(P, PTy, CharUnits::One(),
+ KnownNonNull));
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Value *Env = EmitScalarExpr(E->getArg(0));
return Builder.CreateCall(F, {Env});
}
+ case AMDGPU::BI__builtin_amdgcn_processor_is: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_processor_is should never reach CodeGen for "
+ "concrete targets!");
+ StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString();
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc);
+ }
+ case AMDGPU::BI__builtin_amdgcn_is_invocable: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_is_invocable should never reach CodeGen for "
+ "concrete targets!");
+ auto FD = cast<FunctionDecl>(
+ cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee());
+ StringRef RF =
+ getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF);
+ }
case AMDGPU::BI__builtin_amdgcn_read_exec:
return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false);
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 7cc8374e69d73..24f5262ab3cf4 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc,
if (Result.isInvalid()) return ExprError();
Fn = Result.get();
+ // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved
+ // later, when we check boolean conditions, for now we merely forward it
+ // without any additional checking.
+ if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 &&
+ ArgExprs[0]->getType() == Context.BuiltinFnTy) {
+ auto FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee());
+
+ if (FD->getName() == "__builtin_amdgcn_is_invocable") {
+ auto FnPtrTy = Context.getPointerType(FD->getType());
+ auto R = ImpCastExprToType(Fn, FnPtrTy, CK_BuiltinFnToFnPtr).get();
+ return CallExpr::Create(Context, R, ArgExprs, Context.VoidTy,
+ ExprValueKind::VK_PRValue, RParenLoc,
+ FPOptionsOverride());
+ }
+ }
+
if (CheckArgsForPlaceholders(ArgExprs))
return ExprError();
@@ -13234,6 +13250,20 @@ inline QualType Sema::CheckBitwiseOperands(ExprResult &LHS, ExprResult &RHS,
return InvalidOperands(Loc, LHS, RHS);
}
+static inline bool IsAMDGPUPredicateBI(Expr *E) {
+ if (!E->getType()->isVoidType())
+ return false;
+
+ if (auto CE = dyn_cast<CallExpr>(E)) {
+ if (auto BI = CE->getDirectCallee())
+ if (BI->getName() == "__builtin_amdgcn_processor_is" ||
+ BI->getName() == "__builtin_amdgcn_is_invocable")
+ return true;
+ }
+
+ return false;
+}
+
// C99 6.5.[13,14]
inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
SourceLocation Loc,
@@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
// The following is safe because we only use this method for
// non-overloadable operands.
+ if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get()))
+ return Context.VoidTy;
+
// C++ [expr.log.and]p1
// C++ [expr.log.or]p1
// The operands are both contextually converted to type bool.
@@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, QualType T) {
return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy);
}
+static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) {
+ if (!CE->getBuiltinCallee())
+ return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, CE->getExprLoc());
+
+ if (Ctx.getTargetInfo().getTriple().isSPIRV()) {
+ CE->setType(Ctx.getLogicalOperationType());
+ return CE;
+ }
+
+ bool P = false;
+ auto &TI = Ctx.getTargetInfo();
+
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ auto TID = TI.getTargetID();
+ if (GFX && TID) {
+ auto N = GFX->getString();
+ P = TI.isValidCPUName(GFX->getString()) && TID->find(N) == 0;
+ }
+ } else {
+ auto FD = cast<FunctionDecl>(CE->getArg(0)->getReferencedDeclOfCallee());
+
+ StringRef RF = Ctx.BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ llvm::StringMap<bool> CF;
+ Ctx.getFunctionFeatureMap(CF, FD);
+
+ P = Builtin::evaluateRequiredTargetFeatures(RF, CF);
+ }
+
+ return CXXBoolLiteralExpr::Create(Ctx, P, Ctx.BoolTy, CE->getExprLoc());
+}
+
ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
UnaryOperatorKind Opc, Expr *InputExpr,
bool IsAfterAmp) {
@@ -15753,6 +15818,8 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
// Vector logical not returns the signed variant of the operand type.
resultType = GetSignedVectorType(resultType);
break;
+ } else if (IsAMDGPUPredicateBI(InputExpr)) {
+ break;
} else {
return ExprError(Diag(OpLoc, diag::err_typecheck_unary_expr)
<< resultType << Input.get()->getSourceRange());
@@ -20469,6 +20536,88 @@ void Sema::DiagnoseEqualityWithExtraParens(ParenExpr *ParenE) {
}
}
+static bool ValidateAMDGPUPredicateBI(Sema &Sema, CallExpr *CE) {
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ if (!GFX) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_not_literal);
+ return false;
+ }
+ auto N = GFX->getString();
+ if (!Sema.getASTContext().getTargetInfo().isValidCPUName(N) &&
+ (!Sema.getASTContext().getAuxTargetInfo() ||
+ !Sema.getASTContext().getAuxTargetInfo()->isValidCPUName(N))) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_invalid_value) << N;
+ return false;
+ }
+ } else {
+ auto Arg = CE->getArg(0);
+ if (!Arg || Arg->getType() != Sema.getASTContext().BuiltinFnTy) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_is_invocable_arg_invalid_value) << Arg;
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static Expr *MaybeHandleAMDGPUPredicateBI(Sema &Sema, Expr *E, bool &Invalid) {
+ if (auto UO = dyn_cast<UnaryOperator>(E)) {
+ auto SE = dyn_cast<CallExpr>(UO->getSubExpr());
+ if (IsAMDGPUPredicateBI(SE)) {
+ assert(
+ UO->getOpcode() == UnaryOperator::Opcode::UO_LNot &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, SE)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ UO->setSubExpr(ExpandAMDGPUPredicateBI(Sema.getASTContext(), SE));
+ UO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return UO;
+ }
+ }
+ if (auto BO = dyn_cast<BinaryOperator>(E)) {
+ auto LHS = dyn_cast<CallExpr>(BO->getLHS());
+ auto RHS = dyn_cast<CallExpr>(BO->getRHS());
+ if (IsAMDGPUPredicateBI(LHS) && IsAMDGPUPredicateBI(RHS)) {
+ assert(
+ BO->isLogicalOp() &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, LHS) ||
+ !ValidateAMDGPUPredicateBI(Sema, RHS)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ BO->setLHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), LHS));
+ BO->setRHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), RHS));
+ BO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return BO;
+ }
+ }
+ if (auto CE = dyn_cast<CallExpr>(E))
+ if (IsAMDGPUPredicateBI(CE)) {
+ if (!ValidateAMDGPUPredicateBI(Sema, CE)) {
+ Invalid = true;
+ return nullptr;
+ }
+ return ExpandAMDGPUPredicateBI(Sema.getASTContext(), CE);
+ }
+
+ return nullptr;
+}
+
ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
bool IsConstexpr) {
DiagnoseAssignmentAsCondition(E);
@@ -20480,6 +20629,14 @@ ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
E = result.get();
if (!E->isTypeDependent()) {
+ if (E->getType()->isVoidType()) {
+ bool IsInvalidPredicate = false;
+ if (auto BIC = MaybeHandleAMDGPUPredicateBI(*this, E, IsInvalidPredicate))
+ return BIC;
+ else if (IsInvalidPredicate)
+ return ExprError();
+ }
+
if (getLangOpts().CPlusPlus)
return CheckCXXBooleanCondition(E, IsConstexpr); // C++ 6.4p4
diff --git a/clang/test/CodeGen/amdgpu-builtin-cpu-is.c b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
new file mode 100644
index 0000000000000..6e261d9f5d239
--- /dev/null
+++ b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
@@ -0,0 +1,65 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCNSPIRV %s
+
+// Test that, depending on triple and, if applicable, target-cpu, one of three
+// things happens:
+// 1) for gfx900 we emit a call to trap (concrete target, matches)
+// 2) for gfx1010 we emit an empty kernel (concrete target, does not match)
+// 3) for AMDGCNSPIRV we emit llvm.amdgcn.is.gfx900 as a bool global, and
+// load from it to provide the condition a br (abstract target)
+//.
+// AMDGCN-GFX900: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCN-GFX1010: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCNSPIRV: @llvm.amdgcn.is.gfx900 = external addrspace(1) externally_initialized constant i1
+//.
+// AMDGCN-GFX900-LABEL: define dso_local void @foo(
+// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX900-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX900-NEXT: call void @llvm.trap()
+// AMDGCN-GFX900-NEXT: ret void
+//
+// AMDGCN-GFX1010-LABEL: define dso_local void @foo(
+// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX1010-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX1010-NEXT: ret void
+//
+// AMDGCNSPIRV-LABEL: define spir_func void @foo(
+// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] {
+// AMDGCNSPIRV-NEXT: [[ENTRY:.*:]]
+// AMDGCNSPIRV-NEXT: [[TMP0:%.*]] = load i1, ptr addrspace(1) @llvm.amdgcn.is.gfx900, align 1
+// AMDGCNSPIRV-NEXT: br i1 [[TMP0]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// AMDGCNSPIRV: [[IF_THEN]]:
+// AMDGCNSPIRV-NEXT: call addrspace(4) void @llvm.trap()
+// AMDGCNSPIRV-NEXT: br label %[[IF_END]]
+// AMDGCNSPIRV: [[IF_END]]:
+// AMDGCNSPIRV-NEXT: ret void
+//
+void foo() {
+ if (__builtin_cpu_is("gfx90...
[truncated]
|
@llvm/pr-subscribers-clang-codegen Author: Alex Voicu (AlexVlx) ChangesThis change adds two semi-magical builtins for AMDGPU:
Neither of these are The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics. Patch is 59.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134016.diff 17 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+ // When used as the predicate for a control structure
+ bool __builtin_amdgcn_processor_is(const char*);
+ bool __builtin_amdgcn_is_invocable(builtin_name);
+ // Otherwise
+ void __builtin_amdgcn_processor_is(const char*);
+ void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+ if (__builtin_amdgcn_processor_is("gfx1201") ||
+ __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (!__builtin_amdgcn_processor_is("gfx906"))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_processor_is("gfx1010") ||
+ __builtin_amdgcn_processor_is("gfx1101"))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+ do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+ for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+ if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ do {
+ *p -= x;
+ } while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+ for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+ if (...)
+ while (...)
+ do { } while (...)
+ for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+ be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+ by the current target; the argument MUST be either a generic or AMDGPU
+ specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+ void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+ if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+ else if (__builtin_amdgcn_processor_is(str)) __builtin_trap();
+
+ bool a = __builtin_amdgcn_processor_is("gfx906");
+ const bool b = !__builtin_amdgcn_processor_is("gfx906");
+ const bool c = !__builtin_amdgcn_processor_is("gfx906");
+ bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto f =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto g =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ __builtin_amdgcn_processor_is("gfx1201")
+ ? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42);
+ if (pfn(__builtin_amdgcn_processor_is("gfx1200")))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return;
+ else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap();
+ }
+
+When invoked while compiling for a concrete target, the builtins are evaluated
+early by Clang, and never produce any CodeGen effects / have no observable
+side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v,
+which is an abstract target, a series of predicate values are implicitly
+created. These predicates get resolved when finalizing the compilation process
+for a concrete target, and shall reflect the latter's identity and features.
+Thus, it is possible to author high-level code, in e.g. HIP, that is target
+adaptive in a dynamic fashion, contrary to macro based mechanisms.
ARM/AArch64 Language Extensions
-------------------------------
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 44ef404aee72f..5d01a7e75f7e7 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+// These are special FE only builtins intended for forwarding the requirements
+// to the ME.
+BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu")
+BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu")
+
//===----------------------------------------------------------------------===//
// R600-NI only builtins.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5e45482584946..45f0f9eb88e55 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine
// AMDGCN builtins diagnostics
def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+ "literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+ "AMDGCN processor identifier; '%0' is not valid">;
+def err_amdgcn_is_invocable_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_is_invocable must be either a "
+ "target agnostic builtin or an AMDGCN target specific builtin; `%0`"
+ " is not valid">;
} // end of sema component.
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index 5b5f47f9647a2..eb43d9b0be283 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
Float128Format = DoubleFormat;
}
}
+
+bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const {
+ return AMDGPUTI.isValidCPUName(CPU);
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 78505d66d6f2f..7aa13cbeb89fd 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final
}
bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+
+ // This is only needed for validating arguments passed to
+ // __builtin_amdgcn_processor_is
+ bool isValidCPUName(StringRef Name) const override;
};
} // namespace targets
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index b56b739094ff3..7b1a3815144b4 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
+ auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());
+
+ auto P = cast<GlobalVariable>(
+ CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy));
+ P->setConstant(true);
+ P->setExternallyInitialized(true);
+
+ return CGF.Builder.CreateLoad(RawAddress(P, PTy, CharUnits::One(),
+ KnownNonNull));
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Value *Env = EmitScalarExpr(E->getArg(0));
return Builder.CreateCall(F, {Env});
}
+ case AMDGPU::BI__builtin_amdgcn_processor_is: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_processor_is should never reach CodeGen for "
+ "concrete targets!");
+ StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString();
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc);
+ }
+ case AMDGPU::BI__builtin_amdgcn_is_invocable: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_is_invocable should never reach CodeGen for "
+ "concrete targets!");
+ auto FD = cast<FunctionDecl>(
+ cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee());
+ StringRef RF =
+ getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF);
+ }
case AMDGPU::BI__builtin_amdgcn_read_exec:
return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false);
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 7cc8374e69d73..24f5262ab3cf4 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc,
if (Result.isInvalid()) return ExprError();
Fn = Result.get();
+ // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved
+ // later, when we check boolean conditions, for now we merely forward it
+ // without any additional checking.
+ if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 &&
+ ArgExprs[0]->getType() == Context.BuiltinFnTy) {
+ auto FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee());
+
+ if (FD->getName() == "__builtin_amdgcn_is_invocable") {
+ auto FnPtrTy = Context.getPointerType(FD->getType());
+ auto R = ImpCastExprToType(Fn, FnPtrTy, CK_BuiltinFnToFnPtr).get();
+ return CallExpr::Create(Context, R, ArgExprs, Context.VoidTy,
+ ExprValueKind::VK_PRValue, RParenLoc,
+ FPOptionsOverride());
+ }
+ }
+
if (CheckArgsForPlaceholders(ArgExprs))
return ExprError();
@@ -13234,6 +13250,20 @@ inline QualType Sema::CheckBitwiseOperands(ExprResult &LHS, ExprResult &RHS,
return InvalidOperands(Loc, LHS, RHS);
}
+static inline bool IsAMDGPUPredicateBI(Expr *E) {
+ if (!E->getType()->isVoidType())
+ return false;
+
+ if (auto CE = dyn_cast<CallExpr>(E)) {
+ if (auto BI = CE->getDirectCallee())
+ if (BI->getName() == "__builtin_amdgcn_processor_is" ||
+ BI->getName() == "__builtin_amdgcn_is_invocable")
+ return true;
+ }
+
+ return false;
+}
+
// C99 6.5.[13,14]
inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
SourceLocation Loc,
@@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
// The following is safe because we only use this method for
// non-overloadable operands.
+ if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get()))
+ return Context.VoidTy;
+
// C++ [expr.log.and]p1
// C++ [expr.log.or]p1
// The operands are both contextually converted to type bool.
@@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, QualType T) {
return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy);
}
+static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) {
+ if (!CE->getBuiltinCallee())
+ return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, CE->getExprLoc());
+
+ if (Ctx.getTargetInfo().getTriple().isSPIRV()) {
+ CE->setType(Ctx.getLogicalOperationType());
+ return CE;
+ }
+
+ bool P = false;
+ auto &TI = Ctx.getTargetInfo();
+
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ auto TID = TI.getTargetID();
+ if (GFX && TID) {
+ auto N = GFX->getString();
+ P = TI.isValidCPUName(GFX->getString()) && TID->find(N) == 0;
+ }
+ } else {
+ auto FD = cast<FunctionDecl>(CE->getArg(0)->getReferencedDeclOfCallee());
+
+ StringRef RF = Ctx.BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ llvm::StringMap<bool> CF;
+ Ctx.getFunctionFeatureMap(CF, FD);
+
+ P = Builtin::evaluateRequiredTargetFeatures(RF, CF);
+ }
+
+ return CXXBoolLiteralExpr::Create(Ctx, P, Ctx.BoolTy, CE->getExprLoc());
+}
+
ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
UnaryOperatorKind Opc, Expr *InputExpr,
bool IsAfterAmp) {
@@ -15753,6 +15818,8 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
// Vector logical not returns the signed variant of the operand type.
resultType = GetSignedVectorType(resultType);
break;
+ } else if (IsAMDGPUPredicateBI(InputExpr)) {
+ break;
} else {
return ExprError(Diag(OpLoc, diag::err_typecheck_unary_expr)
<< resultType << Input.get()->getSourceRange());
@@ -20469,6 +20536,88 @@ void Sema::DiagnoseEqualityWithExtraParens(ParenExpr *ParenE) {
}
}
+static bool ValidateAMDGPUPredicateBI(Sema &Sema, CallExpr *CE) {
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ if (!GFX) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_not_literal);
+ return false;
+ }
+ auto N = GFX->getString();
+ if (!Sema.getASTContext().getTargetInfo().isValidCPUName(N) &&
+ (!Sema.getASTContext().getAuxTargetInfo() ||
+ !Sema.getASTContext().getAuxTargetInfo()->isValidCPUName(N))) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_invalid_value) << N;
+ return false;
+ }
+ } else {
+ auto Arg = CE->getArg(0);
+ if (!Arg || Arg->getType() != Sema.getASTContext().BuiltinFnTy) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_is_invocable_arg_invalid_value) << Arg;
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static Expr *MaybeHandleAMDGPUPredicateBI(Sema &Sema, Expr *E, bool &Invalid) {
+ if (auto UO = dyn_cast<UnaryOperator>(E)) {
+ auto SE = dyn_cast<CallExpr>(UO->getSubExpr());
+ if (IsAMDGPUPredicateBI(SE)) {
+ assert(
+ UO->getOpcode() == UnaryOperator::Opcode::UO_LNot &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, SE)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ UO->setSubExpr(ExpandAMDGPUPredicateBI(Sema.getASTContext(), SE));
+ UO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return UO;
+ }
+ }
+ if (auto BO = dyn_cast<BinaryOperator>(E)) {
+ auto LHS = dyn_cast<CallExpr>(BO->getLHS());
+ auto RHS = dyn_cast<CallExpr>(BO->getRHS());
+ if (IsAMDGPUPredicateBI(LHS) && IsAMDGPUPredicateBI(RHS)) {
+ assert(
+ BO->isLogicalOp() &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, LHS) ||
+ !ValidateAMDGPUPredicateBI(Sema, RHS)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ BO->setLHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), LHS));
+ BO->setRHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), RHS));
+ BO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return BO;
+ }
+ }
+ if (auto CE = dyn_cast<CallExpr>(E))
+ if (IsAMDGPUPredicateBI(CE)) {
+ if (!ValidateAMDGPUPredicateBI(Sema, CE)) {
+ Invalid = true;
+ return nullptr;
+ }
+ return ExpandAMDGPUPredicateBI(Sema.getASTContext(), CE);
+ }
+
+ return nullptr;
+}
+
ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
bool IsConstexpr) {
DiagnoseAssignmentAsCondition(E);
@@ -20480,6 +20629,14 @@ ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
E = result.get();
if (!E->isTypeDependent()) {
+ if (E->getType()->isVoidType()) {
+ bool IsInvalidPredicate = false;
+ if (auto BIC = MaybeHandleAMDGPUPredicateBI(*this, E, IsInvalidPredicate))
+ return BIC;
+ else if (IsInvalidPredicate)
+ return ExprError();
+ }
+
if (getLangOpts().CPlusPlus)
return CheckCXXBooleanCondition(E, IsConstexpr); // C++ 6.4p4
diff --git a/clang/test/CodeGen/amdgpu-builtin-cpu-is.c b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
new file mode 100644
index 0000000000000..6e261d9f5d239
--- /dev/null
+++ b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
@@ -0,0 +1,65 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCNSPIRV %s
+
+// Test that, depending on triple and, if applicable, target-cpu, one of three
+// things happens:
+// 1) for gfx900 we emit a call to trap (concrete target, matches)
+// 2) for gfx1010 we emit an empty kernel (concrete target, does not match)
+// 3) for AMDGCNSPIRV we emit llvm.amdgcn.is.gfx900 as a bool global, and
+// load from it to provide the condition a br (abstract target)
+//.
+// AMDGCN-GFX900: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCN-GFX1010: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCNSPIRV: @llvm.amdgcn.is.gfx900 = external addrspace(1) externally_initialized constant i1
+//.
+// AMDGCN-GFX900-LABEL: define dso_local void @foo(
+// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX900-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX900-NEXT: call void @llvm.trap()
+// AMDGCN-GFX900-NEXT: ret void
+//
+// AMDGCN-GFX1010-LABEL: define dso_local void @foo(
+// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX1010-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX1010-NEXT: ret void
+//
+// AMDGCNSPIRV-LABEL: define spir_func void @foo(
+// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] {
+// AMDGCNSPIRV-NEXT: [[ENTRY:.*:]]
+// AMDGCNSPIRV-NEXT: [[TMP0:%.*]] = load i1, ptr addrspace(1) @llvm.amdgcn.is.gfx900, align 1
+// AMDGCNSPIRV-NEXT: br i1 [[TMP0]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// AMDGCNSPIRV: [[IF_THEN]]:
+// AMDGCNSPIRV-NEXT: call addrspace(4) void @llvm.trap()
+// AMDGCNSPIRV-NEXT: br label %[[IF_END]]
+// AMDGCNSPIRV: [[IF_END]]:
+// AMDGCNSPIRV-NEXT: ret void
+//
+void foo() {
+ if (__builtin_cpu_is("gfx90...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, in general I'm a fan of being able to use LLVM-IR as a more general target. We already hack around these things in practice, so I think it's only beneficial to formalize is in a more correct way, even if LLVM-IR wasn't 'strictly' intended to be this kind of serialization format.
// AMDGCNSPIRV-NEXT: ret void | ||
// | ||
void foo() { | ||
if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to handle builtins that require certain target features to be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test? Something simple like +dpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test? Something simple like
+dpp
?
Sure, but if possible, could you clarify what you would like to be tested / what you expect to see, so that we avoid churning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with how the ROCm device libs does it, is that certain builtins require target features to be used. It hacks around this with the __attribute__((target))
. I just want to know that you can call a builtin that requires +ddp
features without that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth a release note item.
Indeed! I botched moving the changes from my internal scratchpad, and the rel notes got lost; fixing. |
// When used as the predicate for a control structure | ||
bool __builtin_amdgcn_processor_is(const char*); | ||
bool __builtin_amdgcn_is_invocable(builtin_name); | ||
// Otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an use case for the “Otherwise” scenario? No, it merely ensures that other uses are broken since there’s no void-to-anything conversion possible. Apologies if you are asking a different question and I misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean as "as the predicate for a control structure" here for there to be an otherwise. Why not just make this a diagnostic? OR have a reasonable value in those cases? Also, having the argument type change as well as the return value for __builtin_amdgcn_is_invocable
is pretty novel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-answering my own question: Later on you seem to be using 'predicate for a control structure' to mean 'the condition of a if/while/for. However, why is it problematic to have someone check this and store it in a variable? Why is:
if (__builtin_amd_gcn_processor_is("gfx1201"))
fine, but:
bool b = __builtin_amd_gcn_processor_is("gfx1201"); if (b)
a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what is the type in an unevaluated context? So what is decltype(__builtin_amd_gcn_processor_is("gfx1201"))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type, when "observable", is always void
. So e.g. decltype(__builtin_amdgcn_processor_is(...)), sizeof(__builtin_amdgcn_processor_is(...)), auto x = __builtin_amdgcn_processor_is(...); decltype(x)
would always be void
/ errors. I will pick up the other two Qs in a more thorough reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when that context is inside of an if
?
Either way, I'm pretty against the void
return type change part of this design. It seems like a poor design at that point. @AaronBallman can comment if he'd like, but I suspect he agrees with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not comfortable with a signature that changes depending on context. That's a pretty novel design.
@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, | |||
llvm::Value *Env = EmitScalarExpr(E->getArg(0)); | |||
return Builder.CreateCall(F, {Env}); | |||
} | |||
case AMDGPU::BI__builtin_amdgcn_processor_is: { | |||
assert(CGM.getTriple().isSPIRV() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check getTarget().getTargetOpts().CPU empty or generic instead? This could also be used by amdgcn triple with an empty target-cpu to generate some generic bitcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we 100% should support this for normal IR. We could then replace all of those oclc_isa_version
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced that it's ideal to have multiple ways of generating "generic" IRs, one of which is mostly ad-hoc and only known to us, hence I'm not entirely convinced we want to encourage this. Yes, I appreciate that libc or ROCDL do this, but perhaps it's better to eventually off-ramp these into AMDGCN SPIR-V. If you two do not mind, I'd consider leaving this as a follow-up action, rather than doing it now.
First read through this, I find myself wondering WHY these aren't constexpr. They seem exactly the sort of thing that folks would like to use |
There are a few reasons, primarily though:
Concerns around early AST tearing resulting from (3) in multi-pass compilation cases like HIP were also considerable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user interface design of these builtins is just completely unacceptable, and it has made the sema implementation more difficult than it needs to be.
Best I can tell, the answer for even things like decltype
changes whether it is in a conditional or not, which isn't right.
// When used as the predicate for a control structure | ||
bool __builtin_amdgcn_processor_is(const char*); | ||
bool __builtin_amdgcn_is_invocable(builtin_name); | ||
// Otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean as "as the predicate for a control structure" here for there to be an otherwise. Why not just make this a diagnostic? OR have a reasonable value in those cases? Also, having the argument type change as well as the return value for __builtin_amdgcn_is_invocable
is pretty novel.
clang/docs/LanguageExtensions.rst
Outdated
|
||
while (__builtin_amdgcn_processor_is("gfx1101")) *p += x; | ||
|
||
do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the formatting OK on this line? Github is making this look awful weird.
Also-also: This is an infinite loop, right? As you said this is never evaluated at runtime, the answer would be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would indeed be an infinite loop iff the concrete target is gfx1010
, where it'd expand into do { *p -= x; } while (true)
. Otherwise, it's a one trip loop. I've put these in for illustration only, I will tidy the tests though to ensure no infinite loops persist, thank you for pointing it out.
|
||
do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010")); | ||
|
||
for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about inf. loop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this'd be an infinite loop ever, it's either 0 trips or one trip, if the predicate is true we just break
. Otherwise, the discussion from the previous inf loop applies.
clang/docs/LanguageExtensions.rst
Outdated
|
||
do { | ||
*p -= x; | ||
} while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting on this one is weird too. Also, same question again, and again on 4972.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my previous replies, hopefully they clarify matters.
// When used as the predicate for a control structure | ||
bool __builtin_amdgcn_processor_is(const char*); | ||
bool __builtin_amdgcn_is_invocable(builtin_name); | ||
// Otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-answering my own question: Later on you seem to be using 'predicate for a control structure' to mean 'the condition of a if/while/for. However, why is it problematic to have someone check this and store it in a variable? Why is:
if (__builtin_amd_gcn_processor_is("gfx1201"))
fine, but:
bool b = __builtin_amd_gcn_processor_is("gfx1201"); if (b)
a problem?
@@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, | |||
if (Result.isInvalid()) return ExprError(); | |||
Fn = Result.get(); | |||
|
|||
// The __builtin_amdgcn_is_invocable builtin is special, and will be resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of this AT ALL. This needs to be much less special.
@@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS, | |||
// The following is safe because we only use this method for | |||
// non-overloadable operands. | |||
|
|||
if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to diagnose this not in the 'right' place we should, though that is going ot be absolutely fraught with problems. But making it return void
is not the way to do this.
bool P = false; | ||
auto &TI = Ctx.getTargetInfo(); | ||
|
||
if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't sufficient, you actually have to check that it is the builtin. If you check just the name it could be a different one, as this doesn't give you the name of namespaces/classes/etc, just the immediate name.
ALSO, getName
in many cases will assert if this is an operator/other nameless function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are both true, and I thank you for the feedback. As a relatively weak retort, I will note that:
- I went for the names because it felt a bit icky to add the AMDGPU specific builtin header, considering we're trying to limit the scope of these; also I did not feel confident enough to make these generic Clang BIs (for good reason, as the review shows:));
- The call to this function comes after having already checked that the Callee is one of the predicates,
IsAMDGPUPredicateBI
andValidateAMDGPUPredicateBI
get called before, so the precondition that we are indeed dealing with the magical BIs is established; furthermore, we're already checking upon entry that the Callee is indeed a builtin, and I believe that builtins always have non-elaborated names which can always be obtained via getName - I could be wrong here.
Having said that, using the Builtin IDs would indeed be nicer, so I can switch to that, thank you for the suggestion.
I understand the reasoning behind providing such mechanisms to guard hardware specific code for targets, that are being JITed, but I'm not sure about, how would it work in SPIR-V case. Like it's is described now you are able to easily remove the instructions within the guarded block. But SPIR-V also provides My review is not gating here, but just for my curiosity (and to have an opinion about:
|
@erichkeane thank you for the review / analysis - I do appreciate the ickiness of this and your absolute dislike for the
You do ask the following incredibly valid question: "What misuse here? If you want to catch other situations, just diagnose. But it seems that these values could be useful/possible outside of condition expressions.". prima facie, absolutely nothing: in an ideal world these would just be boring
and then after relatively intricate (but undoubtedly absolutely crucial for their particular case - Hyrum's law applies) steps expects the check to still be "correct", and thus goes and guards some target specific ASM. Said ASM might cause absolute societal collapse if run on the wrong target, we don't know. At best, it's not ASM, it's just an unsupported builtin and a rather opaque BE error obtains, making the person mad at how awful compilers are. Therefore, we do guard against this (in LLVM, within the custom pass for expanding these), wherein we essentially enforce the requirement that all use chains for a predicate are constant foldable into a terminator. However, the quality of errors produced at that point in the compilation flow is lower, which can lead to user confusion. Furthermore, it turns out that it is non-trivial to teach non-compiler folks about which requirements apply to valid uses of these / prevent misuses. Conversely, the constraints proposed here have the rather desirable property that they are binary (form two classes of uses), relatively easy to convey (things only work here) and make some measure of intuitive sense, without having to appeal to imagination. |
So in short: what you're trying to prevent is "this was stored in a variable, then checked later when we are no longer on the device, thus the answer is different". Your solution doesn't actually DO that, and acts in a way that is inconsistent with the language. Your attempts here are defeated by common patterns, including once where variables are declared or altered inside of a condition statement. So any attempts here are pretty fraught with errors. Consider:
So any attempt to do so is, honestly, partial at best, and confusing for no good reason. Values change in a program, and these are no different, so that sounds like a common programming mistake. Also, for some reason, this disallows the conditional-operator as well? Which is another confusing thing for users, as they consider that to be a shortcut for if/else assignments IMO, we are better served by a warning diagnostic if we detect these are misused. |
Thank you for the feedback! I might not be getting the question right (case in which I apologise in advance!), but I think that for "vanilla" SPIR-V i.e. not vendor flavoured one, where one strictly has to deal with Extensions / non-core capabilities, we probably would have the following situation:
No it is not, these have to be constant foldable (part of the motivation for the awkward design that is giving people pause is to unambiguously constrain them to cases where they will be constant foldable, strictly based on their value, without trying to explain that wrinkle to users). Adding in arbitrary run time values / performing arbitrary ops on these means that we might inadvertently allow through invalid code. If the question is more about something along the lines of "at run time a value is passed as an input to the finalisation process i.e. it is constant" in theory this could be made to work, but the interface isn't quite there and it remains dangerous. For completeness, I will note that in some sense a run time value is being passed for the SPIR-V case, as the target is only known at run time, although this is not user controlled.
In theory, only by virtue of it being a load from an externally initialised global variable, with all the implications of that. However, the intention is to run the expansion pass as early as possible, immediately after Clang CodeGen, so that subsequent passes don't have to deal with these being present at all (they are meant to be rather ephemeral, and would only ever come into being when targeting
Yes, this is always enabled for AMDGPU and has to work with -O0 / |
Not quite, although that is definitely an interesting consideration. What I am trying to address here is the not invalid concern that if you allow these to compose in arbitrary ways, with arbitrary values, stash and retrieve them to the point of being unable to trace them back, people will do it. And then they will build a rather complex contraption that makes perfect sense for them, is extremely valid and useful, but also ends up either failing to fold (not very helpful error message, they'll probably be unhappy with the compiler) or, worse, folds into something that completely subverted the value of the predicate (what was meant to be false is now true), and an ASM sequence that melts their GPU or launches nuclear missiles gets through (definitely unhappy). I am slightly more ambivalent than some of my colleagues towards this, but I cannot outright discard the concern - hence the awkward design.
Indeed, with the minor nit that with the current PR that'd actually not work, but rather fail, since the innermost context for the BI's use would be the initialisation of X, and for initialisations it is just a void returning function; we only expand/promote in a boolean condition.
Completely agreed, with the slight objection that this is subversive. The goal is not to outsmart really clever users, that's not tractable, but merely to prevent enthusiasm driven misuse that could lead to extremely sub-optimal outcomes.
The conditional-operator was considered "too confusing / why would you ever do that / what use could it be" material when I suggested it should work, hence it ended up as disallowed. Adding it back wouldn't be a problem.
This is a very good suggestion, thank you very much for it - it might well be where we end up. My worry is that ignoring warning and diagnostics is rather common. If I may be so bold as to inquire: would you and @AaronBallman be slightly less horrified if the return type variance would be replaced with returning an odd type that only knows how to |
This change adds two semi-magical builtins for AMDGPU:
__builtin_amdgcn_processor_is
, which is similar in observable behaviour with__builtin_cpu_is
, except that it is never "evaluated" at run time;__builtin_amdgcn_is_invocable
, which is behaviourally similar with__has_builtin
, except that it is not a macro (i.e. not evaluated at preprocessing time).Neither of these are
constexpr
, even though when compiling for concrete (i.e.gfxXXX
/gfxXXX-generic
) targets they get evaluated in Clang, so they shouldn't tear the AST too badly / at all for multi-pass compilation cases like HIP. They can only be used in specific contexts (as args to control structures).The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome).
In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics.