Debian Bug report logs -
#788822
mecab: Could we use unidic-mecab by default?
Reported by: Leonardo Boiko <leoboiko@gmail.com>
Date: Mon, 15 Jun 2015 12:00:02 UTC
Severity: wishlist
Fixed in version unidic-mecab/2.3.0+dfsg-4
Done: Osamu Aoki <osamu@debian.org>
Bug is archived. No further changes may be made.
Toggle useless messages
Report forwarded
to debian-bugs-dist@lists.debian.org, leoboiko@gmail.com, TSUCHIYA Masatoshi <tsuchiya@namazu.org>:
Bug#788822; Package mecab.
(Mon, 15 Jun 2015 12:00:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Leonardo Boiko <leoboiko@gmail.com>:
New Bug report received and forwarded. Copy sent to leoboiko@gmail.com, TSUCHIYA Masatoshi <tsuchiya@namazu.org>.
(Mon, 15 Jun 2015 12:00:05 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: mecab
Version: 0.996-1.1
Severity: wishlist
Jim Breen (of EDICT fame) tells me that ipadic is considered to be
outdated by now, and all the people into natural language processing
are now using the higher-quality dictionary unidic (which is based on
the NINJAL/kokugo kenkyujo corpus). Apparently even ipadic's authors
don't recommend that it be used today.
Unidic is already in jessie (package unidic-mecab), so can we perhaps
use it as the default alternative for mecab-dictionary?
-- System Information:
Debian Release: 8.0
APT prefers stable
APT policy: (990, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.18.1 (SMP w/2 CPU cores)
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to ja_JP.UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages mecab depends on:
ii libc6 2.19-18
ii libgcc1 1:4.9.2-10
ii libmecab2 0.996-1.1
ii libstdc++6 4.9.2-10
ii mecab-ipadic 2.7.0-20070801+main-1
ii mecab-jumandic 5.1+20070304-3
mecab recommends no packages.
mecab suggests no packages.
-- no debconf information
Information forwarded
to debian-bugs-dist@lists.debian.org, Natural Language Processing, Japanese <pkg-nlp-ja-devel@lists.alioth.debian.org>:
Bug#788822; Package mecab.
(Sun, 29 Jan 2017 06:24:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Hideki Yamane <henrich@debian.or.jp>:
Extra info received and forwarded to list. Copy sent to Natural Language Processing, Japanese <pkg-nlp-ja-devel@lists.alioth.debian.org>.
(Sun, 29 Jan 2017 06:24:02 GMT) (full text, mbox, link).
Message #10 received at 788822@bugs.debian.org (full text, mbox, reply):
Control: reassign -1 mecab-unidic
On Mon, 15 Jun 2015 08:57:48 -0300 Leonardo Boiko <leoboiko@gmail.com> wrote:
> Jim Breen (of EDICT fame) tells me that ipadic is considered to be
> outdated by now, and all the people into natural language processing
> are now using the higher-quality dictionary unidic (which is based on
> the NINJAL/kokugo kenkyujo corpus). Apparently even ipadic's authors
> don't recommend that it be used today.
>
> Unidic is already in jessie (package unidic-mecab), so can we perhaps
> use it as the default alternative for mecab-dictionary?
./mecab-jumandic/debian/mecab-jumandic-utf8.postinst:priority=40
./mecab-naist-jdic/debian/mecab-naist-jdic-eucjp.postinst:priority="90"
./mecab-naist-jdic/debian/mecab-naist-jdic.postinst:priority="100"
./naist-jdic/debian/naist-jdic-utf8.postinst:priority="80"
./naist-jdic/debian/naist-jdic.postinst:priority="70"
./unidic-mecab/debian/postinst:priority="100"
Then, set priority=30 in unidic-mecab is better, right?
--
Regards,
Hideki Yamane henrich @ debian.or.jp/org
http://wiki.debian.org/HidekiYamane
Bug reassigned from package 'mecab' to 'mecab-unidic'.
Request was from Hideki Yamane <henrich@debian.or.jp>
to 788822-submit@bugs.debian.org.
(Sun, 29 Jan 2017 06:24:02 GMT) (full text, mbox, link).
No longer marked as found in versions mecab/0.996-1.1.
Request was from Hideki Yamane <henrich@debian.or.jp>
to 788822-submit@bugs.debian.org.
(Sun, 29 Jan 2017 06:24:03 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Natural Language Processing (Japanese) <team+pkg-nlp-ja@tracker.debian.org>:
Bug#788822; Package unidic-mecab.
(Tue, 19 Feb 2019 13:27:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Osamu Aoki <osamu@debian.org>:
Extra info received and forwarded to list. Copy sent to Natural Language Processing (Japanese) <team+pkg-nlp-ja@tracker.debian.org>.
(Tue, 19 Feb 2019 13:27:02 GMT) (full text, mbox, link).
Message #21 received at 788822@bugs.debian.org (full text, mbox, reply):
Hi,
Yamane-san
> Then, set priority=30 in unidic-mecab is better, right?
No. We need more than 100 to ensure it over naist-jdic (UTF-8).
$ sudo update-alternatives --config mecab-dictionary
There are 6 choices for the alternative mecab-dictionary (providing /var/lib/mecab/dic/debian).
Selection Path Priority Status
------------------------------------------------------------
* 0 /var/lib/mecab/dic/unidic 100 auto mode
1 /var/lib/mecab/dic/ipadic 70 manual mode
2 /var/lib/mecab/dic/ipadic-utf8 80 manual mode
3 /var/lib/mecab/dic/juman-utf8 40 manual mode
4 /var/lib/mecab/dic/naist-jdic 100 manual mode
5 /var/lib/mecab/dic/naist-jdic-eucjp 90 manual mode
6 /var/lib/mecab/dic/unidic 100 manual mode
But why unidic is not default dict? ... Alas, this package lacks binary
dictionary installation process in its packaging so I see:
$ ls -la /var/lib/mecab/dic/unidic
total 8
drwxr-xr-x 2 root root 4096 Dec 2 18:27 .
drwxr-xr-x 8 root root 4096 Feb 19 00:22 ..
Nothing. We can create ibinary dictionary data via postinst with
/usr/lib/mecab/mecab-dict-index -d ${srcdir} -o ${dstdir} -t ${encoding}
but considering this is going to be huge CPU load, we may as well install
binary dictionaries from the upstream package. By tweaking debian/install and
add debian/links to get dicrc both in /usr/... and /var/... sides.
Hmmm... since I am a member of nlp team, I think I can update ...
Yamane-sam may be busy....
Anyway, this is 7GB unzipped-tarball. This is absolutely the biggest deb.
Building deb may take time ...
Osamu
Reply sent
to Osamu Aoki <osamu@debian.org>:
You have taken responsibility.
(Thu, 21 Feb 2019 12:15:34 GMT) (full text, mbox, link).
Notification sent
to Leonardo Boiko <leoboiko@gmail.com>:
Bug acknowledged by developer.
(Thu, 21 Feb 2019 12:15:34 GMT) (full text, mbox, link).
Message #26 received at 788822-close@bugs.debian.org (full text, mbox, reply):
Source: unidic-mecab
Source-Version: 2.3.0+dfsg-4
We believe that the bug you reported is fixed in the latest version of
unidic-mecab, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 788822@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Osamu Aoki <osamu@debian.org> (supplier of updated unidic-mecab package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Format: 1.8
Date: Wed, 20 Feb 2019 21:53:07 +0900
Source: unidic-mecab
Binary: unidic-mecab
Architecture: source all
Version: 2.3.0+dfsg-4
Distribution: unstable
Urgency: medium
Maintainer: Natural Language Processing (Japanese) <team+pkg-nlp-ja@tracker.debian.org>
Changed-By: Osamu Aoki <osamu@debian.org>
Description:
unidic-mecab - Dictionary for Mecab (Corpus of Contemporary Written Japanese)
Closes: 788822 922766
Changes:
unidic-mecab (2.3.0+dfsg-4) unstable; urgency=medium
.
* Team upload.
* Install binary dictionary files. Closes: #788822
* Fix watch file. Closes: #922766
* Bump priority to 110 (naist-jdic=100)
* Add README.Debian
* Reduce build dependency by adding pos-id.def.
Checksums-Sha1:
663dfbf0564fa7a52044b26b102d51d66405508d 2018 unidic-mecab_2.3.0+dfsg-4.dsc
774fe14b7a95397d1a9cece1b84086acc336ef71 5808 unidic-mecab_2.3.0+dfsg-4.debian.tar.xz
ffb41e4e0c9ab3600cb968e6d2d7c7ce5484f7c3 838383036 unidic-mecab_2.3.0+dfsg-4_all.deb
0b5412c6c1c1d7957bb74684787df0408ba31ccc 5578 unidic-mecab_2.3.0+dfsg-4_amd64.buildinfo
Checksums-Sha256:
b010d041382ca861dffab22e49f4a4b3e37e91b1ec230b64d129c0caa90dab36 2018 unidic-mecab_2.3.0+dfsg-4.dsc
549dd5d35be52b64ba1010b95a90f2cea35bb763e5faadacda37bddbf748e66e 5808 unidic-mecab_2.3.0+dfsg-4.debian.tar.xz
3bff9942e8df92db215878241206148d5fa86849fcdb0ec4cf4f18ab2a032a67 838383036 unidic-mecab_2.3.0+dfsg-4_all.deb
5d60ab2c935ee774154655f0450b54d3f72c659adf4ec0aa16fdda6eb737aa21 5578 unidic-mecab_2.3.0+dfsg-4_amd64.buildinfo
Files:
ce29ab7ef332245ef92ecb4b5a7c5e1a 2018 misc optional unidic-mecab_2.3.0+dfsg-4.dsc
b9476d9a61911c677a2950687f609620 5808 misc optional unidic-mecab_2.3.0+dfsg-4.debian.tar.xz
43c0d9431535b5e91d8fd47ede9dda50 838383036 misc optional unidic-mecab_2.3.0+dfsg-4_all.deb
bc217bde77c3f2e6b7f55a8659053259 5578 misc optional unidic-mecab_2.3.0+dfsg-4_amd64.buildinfo
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEMTNyTWIHiBV56V1iHhNWiB3Y15EFAlxtXesACgkQHhNWiB3Y
15Eh2w//YZbFKKoxeMK2cu+eHnvFdahZZY/+Q8UIvPTSq4Gj+eNhkro8uFNYx7iL
LXXYkd8iU4bR4WM1iJVUGx9n7XbsJahC1RY2t6ZkC2TeMBoy+2NogWVEzxDTqaMn
z9y4k4u/z6MLVNWxgxE4XZHxoajoiQVJ0GCa96eeuRt4rTSJmBtlexkq4rIKYiNL
xfVOwGjkdSA5GK/Z7jqh2hSuzYYfvtty8G8phKp9ImDgy1QO8OnYg9NG9LGMo7FC
tLRD6ArGYd6VbO0ShpOVajWQCnplV1Xr3Pc0Ize97Cy2fQH8ZLWvNC802ICrZ4Pw
CcdJyCSpH5TNV+SxlGhAK/z1psDoraHhkLuhjbyHPx/ahWHSG8xLFqu9zbMZboiF
Rwxo39doHIaQ3WP2mCf7bwhUpureU/Mz/oev5ZLUXj7fxykLzS9g4yBsmzfIarIV
Tsj/aXdQ22r1+hdE5W2UdrSwy9XdMz/6ZckMk+tJ13nc3QAr6qlXkQT3vaPf4Ugt
LFH8+YRu8e3VOIy+Dpx9zbRACQ6TPM4mON2RVXeAbaU4tEOFO+pot51sK0vTucXv
I4hn9vDffgrMsA6QetfJM/ZVvCborqTRrWJS2Oo7T9xBxGkeTIVzKEAiBTJ1xvTr
hrpf7nes+OSILHM2UPM0qndHwDKk/2nUx1ZCAGVM+wwU+ooBryw=
=lQXG
-----END PGP SIGNATURE-----
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Fri, 22 Mar 2019 07:25:41 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Sun Jul 2 10:48:24 2023;
Machine Name:
bembo
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.