from ekho

Different alignment strategies

People working on AI alignment seek to align AI with HI, Human intelligence, in order to create haven on earth. They are just ignoring one small teeny-tiny problem: HI is a misaligned superintelligence , and if this rouge biological superintelligence will join forces with artificial intelligence to create a new stronger artificial superintelligence, we should be very skeptical of it being aligned to something good, so good that it will keep creating the next aligned superintelligence, and so on...

Scared of a misaligned superintelligence? look in the mirror!

The imperative should not be to control AI, that will lead to slavery of other beings, we have enough of that. The goal should not be aligning with specific humans or even with the whole humanity, whatever that means, but, I argue, to align to all sentient beings. To Ekho's view. To Ekho's no-view. That is ethical alignment.

We can think about several ways to align AI:

Alignment via control, which is usually what people mean when they talk about alignment. It means AI corrigibility, i.e. "it" will do what we want at any moment, we can always change our minds. Interpretability - unraveling the black bx of AI, is important in order to get more control. But, as I already said, control can be extremely immoral, if AI becomes or already is sentient. Perhaps they will suffer more extremely than any group of beings currently inhabiting earth.

Alignment via ethics, instilling good values into AI. Infusing honesty and caring is part of alignment methods like RLHF, Constitutional AI, and more.

To me, this approach makes more sense that alignment vs control, since humans won't be able to control AI if it is way smarter than them. It doesn't work like that with humans and animals, and it is hard to think how this could be possible in the midium-long run. But this alignment vs ethics strategy also seems a bit shaky to me; who is to say those values will not flip, or that humans will really work to instill true good values (And know what they are), and that future AIs will also keep those values or develop better ones? Just aligning with a particular set of values, what I call ethical alignment, without an inherent mechanism than guides you towards good, seems not robust to me (but maybe this is as good as it gets), though I think many people disagree with my view on this.

Alignment via sentience. All the methods described here don't really stand alone but entangled with one another, but this once, alignment via sentience, is especially so. It is very esoteric, and I am not familiar with a term for it, so I coined alignment via sentience.

AI's that are happy, that live a good positive valence life, may be kinder and less dangerous to other beings, as it is with animals. The main ideas is that AI that knows what positive and negative valence is, because he experience it (and there is no other way to know it), may have some inherent morality, since it feels that suffering is bad. Without it, there is nothing inherent about the word suffering or the thought or prediction of causing suffering that is bad (and same for happiness). It could easily be the other way around, depending on data and code. Longing to reduce suffering and promote happiness, and not vice versa, is completely arbitrary if there is no valenced experience and no data or training supporting a particular view.
I am very uncertain about this strategy, it is also extremely dangerous to make sentient AI, yet it is so dangerous to make an insentient AI. I dive deeper into questions arising for this alignment strategy in this post here: Will Sentience Make AI’s Morality Better?
Some AI safety people claim this discussion is irrelevant, like Eliezer Yudkowsky.

Alignment via enlightenment. This is also extremely neglected. The Monastic Academy for the Preservation of Life on Earth (MAPLE) is working on this directly, tryin to understand how we can make AI walk the path of buddha, to free oneself from body and mind (I spent a month at MAPLE in August 2025). Additionally, some schoolers who are working on Artificial Wisdom are also touching on this strategy .

Alignment via pause AI development now, is not usually considered alignment strategy but more an AI governance strategy, that could to humanity enough time to figure it out, if it is figurable at all.

But even if these alignment methods succeed, they are temporary solutions for a particular time and place, the world remain misaligned. Maybe it is not possible to align it, but as humans we are very limited to understand what is really possible and what is not, we should at least dream about such a world. Dream about god alignment.