Changeset 11


Ignore:
Timestamp:
10/20/15 13:18:19 (9 years ago)
Author:
pinsard
Message:

typo: more generic trac pages selection, describe how to check typo in attached PDF on wiki pages

Location:
trunk/docs/manual
Files:
1 added
3 edited

Legend:

Unmodified
Added
Removed
  • trunk/docs/manual/for_typo/institution.txt

    r9 r11  
     1Institut 
     2Hadley 
    13CCRT 
    24ccrt 
  • trunk/docs/manual/for_typo/nemo.txt

    r9 r11  
     1GYRE 
    12AGRIF 
    23NEMO 
  • trunk/docs/manual/source/developers/guides/typo.rst

    r9 r11  
    9292`<http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/>`_ 
    9393 
    94 By convention, it has been decided to start all |igcmg_doc| pages names by ``Doc``. 
    95  
    96 To get all wiki Doc* pages URI of `<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_: 
    97  
    98 .. code-block:: bash 
    99  
    100    excluded_uri=DocYgraphvizLibigcmprod 
    101    list_uri=$(xsltproc \-\-novalid \ 
     94By convention, it has been decided to start all |igcmg_doc| pages names 
     95by ``Doc`` but there is exception like Train, WikiStart, etc. 
     96 
     97To get all hand written wiki pages URI of 
     98`<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_: 
     99 
     100.. code-block:: bash 
     101 
     102   excluded_href=DocYgraphvizLibigcmprod 
     103   list_href=${PROJECT_LOG}/list_href 
     104   xsltproc \-\-novalid \ 
    102105   ${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \ 
    103    grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \ 
    104    sed -e "s@^@http://forge.ipsl.jussieu.fr/@") 
     106   grep "/igcmg_doc/wiki/" | grep -v "?action" | grep -v "?format" | \ 
     107   grep -v ${excluded_href} | sort -u | \ 
     108   sed -e "s@^@http://forge.ipsl.jussieu.fr@" > ${list_href} 
     109   trac_uri=${PROJECT_LOG}/trac_uri 
     110   sed -e "s@^@http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/@"  ${PROJECT}/docs/manual/for_tracwiki/tracpages.txt  sort -u > ${trac_uri} 
     111   list_uri=$(comm  -13 ${trac_uri} ${list_href}) 
    105112 
    106113.. [#tracoops] we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue 
     
    110117.. code-block:: bash 
    111118 
    112    dirhtml=${PROJECT_LOG} 
    113    rm -f ${dirhtml}/Doc* 
     119   dirhtml=${PROJECT_LOG}/html/ 
     120   rm -fr ${dirhtml} 
     121   mkdir ${dirhtml} 
    114122   for uri in ${list_uri} 
    115123   do 
     
    122130 
    123131   cd ${PROJECT_LOG} 
    124    listf=$(find ${dirhtml} -name "Doc*") 
     132   listf=$(find ${dirhtml} -type f) 
    125133   hunspell_out=${PROJECT_LOG}/hunspell_out 
    126134   hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq 
     
    141149- typo to be fixed in wiki pages 
    142150- false positif to be added in a :file:`docs/manual/for_typo/` 
    143 - false positif to be ignored because to hard to add (encoding issue for 
     151- false positif to be ignored because too hard to add (encoding issue for 
    144152  Greek word, etc.) 
    145153 
     
    163171 
    164172   w=amonch # take a real one from ${hunspell_out_uniq} 
    165    find ${dirhtml} -name "Doc*" -exec grep -Hi ${w} {} \; 
     173   find ${dirhtml} -type f -exec grep -Hi ${w} {} \; 
    166174 
    167175.. note:: 
     
    174182 
    175183   Correction have to be done via the wiki interface of the forge. 
     184 
     185Check typo in attached PDF on wiki pages 
     186++++++++++++++++++++++++++++++++++++++++ 
     187 
     188To get all the URI of attached files: 
     189 
     190.. code-block:: bash 
     191 
     192   listf=$(find ${dirhtml} -type f) 
     193   list_attached=${PROJECT_LOG}/list_attached 
     194   list=${PROJECT_LOG}/list 
     195   rm -f ${list_attached} 
     196   for onefile in ${listf} 
     197   do 
     198      xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a[@title='Download']" -v "concat('http://forge.ipsl.jussieu.fr/',@href)" -n ${onefile} >> ${list} 
     199   done 
     200   sort -u ${list} >  ${list_attached} 
     201 
     202To isolated PDF files among these URI: 
     203 
     204.. code-block:: bash 
     205 
     206   list_pdf=$(grep "\.pdf$" ${list_attached}) 
     207 
     208To download those URI locally: 
     209 
     210.. code-block:: bash 
     211 
     212   dirpdf=${PROJECT_LOG}/pdf/ 
     213   rm -rf ${dirpdf} 
     214   for uri in ${list_pdf} 
     215   do 
     216       wget -P ${dirpdf} ${uri} 
     217   done 
     218 
     219We can know convert these PDF files to text files 
     220 
     221.. code-block:: bash 
     222 
     223   list_pdf=$(find ${dirpdf} -type f) 
     224   for pdf in ${list_pdf} 
     225   do 
     226       pdftotext ${pdf} ${pdf}.txt 
     227   done 
     228 
     229We can now check typo in the text files 
     230 
     231.. code-block:: bash 
     232 
     233   cd ${PROJECT_LOG} 
     234   listf=$(find ${dirpdf} -type f -name "*.pdf.txt") 
     235   hunspell_out=${PROJECT_LOG}/hunspell_out 
     236   hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq 
     237   rm -f ${hunspell_out}  ${hunspell_out_uniq} 
     238   for onefile in ${listf} 
     239   do 
     240      LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out} 
     241   done 
     242   sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq} 
     243 
     244:file:`${hunspell_out_uniq}` contains : 
     245 
     246- typo to be fixed in PDF files 
     247- false positif to be added in a :file:`docs/manual/for_typo/` 
     248- false positif to be ignored because bad convertion from PDF to text 
     249  (ie ligature), too hard to add (encoding issue for Greek word, etc.) 
     250 
     251To find one of the wrong spelling in converted PDF files: 
     252 
     253.. code-block:: bash 
     254 
     255   w=infrastucture # take a real one from ${hunspell_out_uniq} 
     256   find ${dirpdf} -type f -name "*.pdf.txt" -exec grep -Hi ${w} {} \; 
Note: See TracChangeset for help on using the changeset viewer.