Interested in Linux, FOSS, data storage systems, unfucking our society and a bit of gaming.

I help maintain Nixpkgs.

https://github.com/Atemu
https://reddit.com/u/Atemu12 (Probably won’t be active much anymore.)

  • 15 Posts
  • 7 Comments
Joined 4 years ago
cake
Cake day: June 25th, 2020

help-circle


  • Good luck packaging new stuff

    Packaging is generally hard on any distro.

    Compared to a traditional distro, the packaging difficulty distribution is quite skewed with Nix though as packages that follow common conventions are quite a lot easier to package due to the abstractions Nixpkgs has built for said conventions while some packages are near impossible to package due to the unique constraints Nix (rightfully) enforces.

    good luck creating new options

    Creating options is really simple actually. Had I known you could do that earlier, I would have done so when I was starting out.

    Creating good options APIs is an art to be mastered but you don’t need to do that to get something going.

    good luck cross-compiling

    Have you ever tried cross-compiling on a traditional distro? Cross-compiling using Nixpkgs is quite easy in comparison.

    actually good luck understanding how to configure existing packages

    Yeah, no way to do so other than to read the source.

    It’s usually quite understandable without knowing the exact details though; just look at the function arguments.

    Also beats having no option to configure packages at all. Good luck slightly modifying an Arch package. It has no abstractions for this whatsoever; you have to copy and edit the source. Oh and you need to keep it up to date yourself too.

    Gentoo-like standardised flags would be great and are being worked on.

    good luck getting any kind of PR merged without the say-so of a chosen few

    Hi, one of the “chosen few” here: That’s a security feature.

    Not a particularly good one, mind you, but a security feature nonetheless.

    There’s also now a merge bot now running in the wild allowing maintainers of packages to merge automatic updates on their maintained packages though which alleviates this a bit.

    have fun understanding why some random package is being installed and/or compiled when you switch to a new configuration.

    It can be mysterious sometimes but once you know the tools, you can directly introspect the dependency tree that is core to the concept of Nix and figure out exactly what’s happening.

    I’m not aware of the existence of any such tools in traditional distros though. What do you do on i.e. Arch if your hourly shot of -Syu goes off and fetches some package you’ve never seen before due to an update to some other package? Manually look at PKGBUILDs?


  • The writer will need to tag things down, to minimal details, for the sake of languages that they don’t care about.

    Sure and that’s likely a good bit of work.

    However, you must consider the alternative which is translating the entire text to dozens of languages and doing the same for any update done to said text. I’d assume that to be even more work by at least one order of magnitude.

    Many languages are quite similar to another. An article written in the hypothetical abstract language and tuned on an abstract level to produce good results in German would likely produce good results in Dutch too and likely wouldn’t need much tweaking for good results in e.g. English. This has the potential to save ton of work.

    This issue affects languages as a whole, and sometimes in ways that you can’t arbitrate through a fixed writing style because they convey meaning.

    The point of the abstract language would be to convey the meaning without requiring a language-specific writing style. The language-specific writing style to convey the specified meaning would be up to the language-specific “renderers”.

    (For example: if you don’t encode the social gender into the 3rd person pronouns, English breaks.)

    That’s up to the English “renderer” to do. If it decides to use a pronoun for e.g. a subject that identifies as male, it’d use “he”. All the abstract language’s “sentence” would contain is the concept of a male-identifying subject. (It probably shouldn’t even encode the fact that a pronoun is used as usage of pronouns instead of nouns is also language-specific. Though I guess it could be an optional tag.)

    Often there’s no such thing as the “default”. The example with pig/pork is one of those cases - if whoever is writing the article doesn’t account for the fact that English uses two concepts (pig vs. pork) for what Spanish uses one (cerdo = puerco etc.), and assumes the default (“pig”), you’ll end with stuff like *“pig consumption has increased” (i.e. “pork consumption has decreased”). And the abstraction layer has no way to know if the human is talking about some living animal or its flesh.

    No, that’d simply be a mistake in building the abstract sentence. The concept of a pig was used rather than the concept of edible meat made from pig which would have been the correct subject to use in this sentence.

    Mistakes like this will happen and I’d even consider them likely to happen but the cool thing here is that “pig consumption has increased”, while obviously slightly wrong, would still be quite comprehensible. That’s an insane advantage considering that this would apply to any language for which a generic “renderer” was implemented.


    It ends like that story about a map so large that it represents the terrain accurately being as big as the terrain, thus useless.

    As I said in the top, you’ll end with a “map” that is as large as the “terrain”, thus useless. (Or: spending way more effort explicitly describing all concepts that it’s simply easier to translate it by hand.)

    I don’t see how that would necessarily be the case. Most sentences on Wikipedia are of descriptive nature and follow rather simple structures; only complicated further for the purpose of aiding text flow. Let’s take the first sentence of the Wikipedia article on Lemmy:

    Lemmy is a free and open-source software for running self-hosted social news aggregation and discussion forums.

    This could be represented in a hypothetical abstract sentence like this:

    (explanation
     (proper-noun "lemmy")
     (software-facilitating
      :kind FOSS
      :purpose (purposes
                (apply-property 'self-hosted '(news-aggregation-platform discussion-forum)))))
    

    (IDK why I chose lisp to represent this but it felt surprisingly natural.)

    What this says is that this sentence explains the concept of lemmy by equating it with the concept of a software which facilitates the combination of multiple purposes.

    A language-specific “renderer” such as the English one would then take this abstract representation and turn it into an English sentence:

    The concept of an explanation of a thing would then be turned into an explanation sentence. Explanation sentences depend on what it is that is being explained. In this case, the subject is specifically marked as a proper noun which is usually explained using a structure like “<explained thing> is <explanation>”. (An explanation for a different type of word could use a different structure.) Because it’s a proper noun and at the beginning of a sentence, “Lemmy” would be capitalised.

    Next the explanation part which is declared as a concept of being software of the kind FOSS facilitating some purpose. The combined concept of an object and its purpose is represented as “<object> for the purpose of <purpose>” in English. The object is FOSS here and specifically a software facilitating some purpose, so the English “renderer” can expand this into “free and open-source software for the purpose of facilitating <purpose>”.

    The purpose given is the purpose of having multiple purposes and this concept simply combines multiple purposes into one.
    The purposes are two objects to which a property has been applied. In English, the concept of applying a property is represented as as “a <property as adjective> <object>”, so in this case “a self-hosted news-aggregation platform” and “a self-hosted online discussion forum”. These purposes are then combined using the standard English method of combining multiple objects which is listing them: “a self-hosted news-aggregation platform and a self-hosted online discussion forum”. Because both purposes have the same adjective applied, the English “renderer” would likely make the stylistic choice of implicitly applying it to both which is permitted in English: “a self-hosted news-aggregation platform and online discussion forum”.

    It would then be able to piece together this English sentence: “Lemmy is a free and open source software for the purposes of facilitating a self-hosted news-aggregation platform and online discussion forum.”.

    You could be even more specific in the abstract sentence in order to get exactly the original sentence but this is also a perfectly valid sentence for explaining Lemmy in English. All just from declaring concepts in an abstract way and transforming that abstract representation into natural language text using static rules.







  • Somewhere inside that abstraction you’ll need to have the pieces of info that Spanish “leche” [milk] is feminine, that Zulu “ubisi” [milk] is class 11, that English predicative uses the ACC form, so goes on.

    Of course you do. The beauty of abstraction is that these language-specific parts can be factored into generic language-specific components. The information you’re actually trying to convey can be denoted without any language-specific parts or exceptions and that’s the important part for Wikipedia’s purpose of knowledge preservation and presentation.

    you’ll need people to mark a multitude of distinctions in their sentences, when writing them down, that the abstraction layer would demand for other languages. Such as tagging the “I” in “I see a boy” as “+masculine, +older-person, +informal” so Japanese correctly conveys it as “ore” instead of “boku”, "atashi, “watashi” etc.

    For writing a story or prose, I agree.

    For the purpose of writing Wikipedia articles, this specifically and explicitly does not matter very much. Wikipedia strives to have one unified way of writing within a language. Whether the “I” is masculine or not would be a parameter that would be applied to all text equally (assuming I-narrator was the standard on Wikipedia).

    Even the idea of “abstract concept of milk” doesn’t work as well as it sounds like, because languages will split even the abstract concepts in different ways. For example, does the abstract concept associated with a living pig includes its flesh?

    If your article talks about the concept of a living pig in some way and in the context of that article, it doesn’t matter whether the flesh is included, then you simply use the default word/phrase that the language uses to convey the concept of a pig.

    If it did matter, you’d explicitly describe the concept of “a living pig with its flesh” instead of the more generic concept of a living pig. If that happened to be the default of the target language or the target language didn’t differentiate between the two concepts, both concepts would turn into the same terms in that specific language.

    The same applies to your example of the different forms of “I” in Japanese. To create an appropriate Japanese “rendering” of an abstract sentence, you’d use the abstract concept of “a nerdy shy kid refers to itself” as the i.e. the subject. The Japanese language “renderer” would turn that into a sentence like ”僕は。。。” while the English “renderer” would simply produce “I …”.

    A language is not an agent; it doesn’t “do” something. You’d need people to actively insert those pieces of info for each language, that’s perhaps doable for the most spoken ones, but those are the ones that would benefit the least from this.

    Yes, of course they would have to do that. The cool thing is that this it’d only have to be done once in a generic manner and from that point on you could use that definition to “render” any abstract article into any language you like.

    You must also keep in mind that this effort has to be measured relative to the alternatives. In this case, the alternative is to translate each and every article and all changes done to them into every available language. At the scale of Wikipedia, that is not an easy task and it’s been made clear that that’s simply not happening.

    (Okay, another alternative would be to remain on the status quo with its divergent versions of what are supposed to be the same articles containing the same information.)


  • Languages simply don’t agree on how to split the usage of words. Or grammatical case. Or if, when and how to do agreement.

    Just for the sake of example: how are they going to keep track of case in a way that doesn’t break Hindi, or Basque, or English, or Guarani? Or grammatical gender for a word like “milk”? (not even the Romance languages agree in it.) At a certain point, it gets simply easier to write the article in all those languages than to code something to make it for you.

    I don’t know what the WMF is planning here but what you’re pointing out is precisely what abstraction would solve.

    If you had an abstract way to represent a sentence, you would be independent of any one order or case or whatever other grammatical feature. In the end you obviously do need actual sentences with these features. To get these, you’d build a mechanism that would convert the abstract sentence representation into a concrete sentences for specific languages that is correctly constructed according to those specific languages’ rules.

    Same with gender. What you’d store would not be that e.g. some german sentence is talking about the feminine milk but rather that it’s talking about the abstract concept of milk. How exactly that abstract concept is represented in words would then be up to individual languages to decide.

    I have absolutely no idea whether what I’m talking about here would be practical to implement but it in theory it could work.