-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add unicode upcase
, downcase
#2547
base: master
Are you sure you want to change the base?
Conversation
@liquidaty - Thanks for keeping this issue alive. Given that jq already has ascii_upcase and ascii_downcase, wouldn't it be preferable to use the names Also, perhaps you could trigger a restart of the automated "checks"? |
thank you @pkoppstein -- absolutely, will do. Your suggested names sound fine to me. I realized after submitting the last PR state that the autoconf did not properly work with regards to libutf8proc. I've already done the work (outside of the PR) to handle that so that will be easy to update. The approach it will take is the same as for libutf8proc as it does for onigurama, which is to include it as a submodule in |
What would the main “cons” be in including libutf8proc unconditionally? Do we have any numbers at this point? |
No cons in my view, unless someone had some preference to use their own version of libutf8proc or anyone wants the option to have a (very slightly) lighter-weight At the moment the updated PR I'm prepping already makes it optional, defaulting to yes, and is basically ready to go (I just want to add a simple test before re-submitting), so easy to change it to unconditional |
Assuming we’re ok with the license, I’d suggest NOT making its inclusion conditional, partly because the functionality that will be enabled is so central to Unicode and thus JSON, and because it’s already such a pain having even one external library optional (e.g. w.r.t. man.test). |
add one simple test to jq.test for each include utf8proc files directly in repo as an unconditional part of the build update COPYING to include utf8proc license
I agree with @pkoppstein to make it non-optional as it feels fundamental. Only cons i can come up with is if it makes libjq more complicated to link? should we namespace the symbols somehow? Nice addition! |
OK should be all set now, utf8proc is included unconditionally (hence no need to modify autoconf), filter names changed to |
I've not reviewed this, but yeah, we need this! (We probably also need normalization and form-insensitive string comparisons, but that's for another issue/PR and another time.) |
Actually, please do not merge this just yet. It occurs to me that the current way of unconditionally including libutf8 is to simply include the utf8proc.c into the source code. However, this may cause issues if jq is used as a library together with other code that links the utf8proc library. I will update this PR to unconditionally include libutf8 but still providing an option for the user to specify their own |
Can of worms opening alert :) There was an issue asking for us to make sure all non-static symbols are named So I'm inclined to agree that any external library sources that we fork right into Whereas if we use some external library and import it as a submodule like we did with Oniguruma, then users who want static linking would probably not use the submodules when building So I'm further inclined to believe that if we can treat libutf8 like Oniguruma, that's probably better, and we might be able to then avoid opening this can of worms. Though we should rename our non-static symbols that don't start with |
Couldn't we just add See https://juliastrings.github.io/utf8proc/doc/utf8proc_8h.html Having recently tried and failed to make a "-static" version of jq, my sense is that the benefits of including the library unconditionally outweigh the drawbacks. If it's good enough for Julia, it's good enough for jq. |
Anyone building a busybox-like thing will probably want jq's internal libutf8 to not conflict with other utilities'. Is libutf8 something we can link with and -if need be- include as a submodule, just like Oniguruma?
True for you, and true for me. I'm not very sympathetic to users who want to link statically, but I am willing to do a bit for them. We could have a header that can be included to rename all the relevant symbols, say. I really wish that the various binutils-like tools out there could learn ELF semantics for static linking... (Specifically:
Then we'd not have to consider these silly symbol naming issues.) |
Anyways, for now I'll take the inclusion of this utf8 library and forget about symbol renaming. |
Looking at the header, it looks like we can also get normalization and other Unicode functionality with this! |
I've not reviewed the code in |
allow custom utf8proc location; search default prefix if not found, else use builtin omit utf8proc from libjq changed original modules/utf8proc/Makefile to omit checks that require ruby to be installed
I've updated the PR with a proposed way to thread the needle as follows (apologies for the weird path of merging master into the branch, that was a workaround for an originally ill-targeted commit in the first attempt to update the PR):
This way:
Some downsides:
As an aside, I believe this will address any issues building a |
libjq.so needs to be a thing though (I use it at $WORK!).
|
I'd be OK with the UTF-8 builtins being added by the jq executable and not being in libjq, but I'd rather they be in libjq.
|
OK, maybe we can have our cake and eat it. Updated configure.ac and Makefile.am to support |
@liquidaty - I hope you haven't run out of steam! Is there some way I can help push this over the finishing line? It would be a great shame if all your work didn't make it into jq 1.7. |
@pkoppstein have been getting pulled away but will take a look this week, hopefully on the earlier side (I'm also entirely fine with maintainers making any of the contemplated changes, in case anyone prefers that which sometimes is less work than chasing the change to begin with...) |
@liquidaty - I noticed that you recently merged |
My apologies, I assume from your note however that there are other ways that I'm not aware of. I will have to ask for assistance next time the "Resolve conflict" button pops up for me then, as I don't know how else to resolve. If there are further steps I should take at this time to get it to the way you need/want it to be, please feel free to lmk. |
@liquidaty just do this in the future |
@nicowilliams - What would you suggest @liquidaty do to undo the damage? Start a new branch? |
No need to start a new branch. Just |
OK I can do that. I also though it preferable to consolidate all the changes for this PR into the following 3 chunks. that sound ok? I used the following commands to rebase:
and the rebase consolidated the commit history into:
(I'm still figuring out how to clean up the multi-line comments...) |
Tomorrow I'll try to rebase and clean up the branch's history and force-push to the PR, and I'll include a log of what I did. |
Possible to include this in the upcoming 1.7 release? |
@itchyny @nicowilliams @pkoppstein et al: great job re release 1.7. Coming back to this issue: I had (apparently mistakenly) thought it would be in 1.7 given the prior discussion and the state of the pull which afaik is totally ready to go. Could we put this issue on a definitive roadmap to be merged? |
@liquidaty - Rebasing might be necessary. @itchyny - Once this PR has been prepared for merging, could you please help ensure it’s merged before it becomes outdated again? Many of us have been looking forward to the inclusion of this functionality in a master version of jq, and now that 1.7 has been released, this would be an excellent time, not least because it would allow for any imperfections to be resolved calmly. Thank you. |
ufff, this one is soooo close. Great work @liquidaty! Hopefully a maintainer can get some time to bash out the last few changes needed. |
@bitsondatadev it was fully ready to go well before 1.7 and at this point I'm resigned to just using my own modified version of jq and not waiting for it to ever be officially merged. I'd be happy to get it in release-ready shape again if the maintainers are willing to provide some indication that the effort won't just be ignored, but otherwise I'm probably just never going to use anything past my own modified version 1.6 |
Hello i did a rebase, squash, cleanup and resolved a bunch of review comment and pushed a variant of this here https://github.com/wader/jq/commits/2547-resolve-jq.test-cleanup/ should i push it to the PR branch? |
FWIW I don't believe it's my decision to make but if I'm wrong about that then yes please |
👍 could you take a quick peek before i push to see if things looks sane? i'm mostly concern about configure.ac and Makefile.am which has quite a lot of messy conflicts. |
sure will do. imv the configure.ac changes generally should be fine if it still passes all tests; a separate related question would be, do we want to add more tests to verify the different options that configure.ac now offers related to this feature? I generally prefer a "don't write it if you don't use / test it" principle, but in this case I would override that and vote "no" for various reasons I won't expand on unless folks disagree... |
TODO:
|
Is this PR alive? |
@wader apologies for the belated response. looks good to me. thank you! |
@liquidaty no worries! @git-developer I think the PR it stalled/blocked on the TODOs i mentioned above. Someone has to do some work/decide about those points. I'm leaning towards somehow embedding the utf8proc source for simplicity, less cmake, |
See #492