Context Navigation

-                      r9
+                      r11
 `<http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/>`_
+By convention, it has been decided to start all |igcmg_doc| pages names by ``Doc``.
+To get all wiki Doc* pages URI of `<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_:
+.. code-block:: bash
+   excluded_uri=DocYgraphvizLibigcmprod
+   list_uri=$(xsltproc \-\-novalid \
+By convention, it has been decided to start all |igcmg_doc| pages names
+by ``Doc`` but there is exception like Train, WikiStart, etc.
+To get all hand written wiki pages URI of
+`<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_:
+.. code-block:: bash
+   excluded_href=DocYgraphvizLibigcmprod
+   list_href=${PROJECT_LOG}/list_href
+   xsltproc \-\-novalid \
    ${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \
+   grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \
+   sed -e "s@^@http://forge.ipsl.jussieu.fr/@")
+   grep "/igcmg_doc/wiki/" | grep -v "?action" | grep -v "?format" | \
+   grep -v ${excluded_href} | sort -u | \
+   sed -e "s@^@http://forge.ipsl.jussieu.fr@" > ${list_href}
+   trac_uri=${PROJECT_LOG}/trac_uri
+   sed -e "s@^@http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/@"  ${PROJECT}/docs/manual/for_tracwiki/tracpages.txt  sort -u > ${trac_uri}
+   list_uri=$(comm  -13 ${trac_uri} ${list_href})
 .. [#tracoops] we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue
 …
 .. code-block:: bash
+   dirhtml=${PROJECT_LOG}
+   rm -f ${dirhtml}/Doc*
+   dirhtml=${PROJECT_LOG}/html/
+   rm -fr ${dirhtml}
+   mkdir ${dirhtml}
    for uri in ${list_uri}
    do
 …
    cd ${PROJECT_LOG}
    listf=$(find ${dirhtml} -name "Doc*")
+   listf=$(find ${dirhtml} -type f)
    hunspell_out=${PROJECT_LOG}/hunspell_out
    hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq
 …
 - typo to be fixed in wiki pages
 - false positif to be added in a :file:`docs/manual/for_typo/`
 - false positif to be ignored because to hard to add (encoding issue for
+- false positif to be ignored because too hard to add (encoding issue for
   Greek word, etc.)
 …
    w=amonch # take a real one from ${hunspell_out_uniq}
    find ${dirhtml} -name "Doc*" -exec grep -Hi ${w} {} \;
+   find ${dirhtml} -type f -exec grep -Hi ${w} {} \;
 .. note::
 …
    Correction have to be done via the wiki interface of the forge.
+Check typo in attached PDF on wiki pages
+++++++++++++++++++++++++++++++++++++++++
+To get all the URI of attached files:
+.. code-block:: bash
+   listf=$(find ${dirhtml} -type f)
+   list_attached=${PROJECT_LOG}/list_attached
+   list=${PROJECT_LOG}/list
+   rm -f ${list_attached}
+   for onefile in ${listf}
+   do
+      xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a[@title='Download']" -v "concat('http://forge.ipsl.jussieu.fr/',@href)" -n ${onefile} >> ${list}
+   done
+   sort -u ${list} >  ${list_attached}
+To isolated PDF files among these URI:
+.. code-block:: bash
+   list_pdf=$(grep "\.pdf$" ${list_attached})
+To download those URI locally:
+.. code-block:: bash
+   dirpdf=${PROJECT_LOG}/pdf/
+   rm -rf ${dirpdf}
+   for uri in ${list_pdf}
+   do
+       wget -P ${dirpdf} ${uri}
+   done
+We can know convert these PDF files to text files
+.. code-block:: bash
+   list_pdf=$(find ${dirpdf} -type f)
+   for pdf in ${list_pdf}
+   do
+       pdftotext ${pdf} ${pdf}.txt
+   done
+We can now check typo in the text files
+.. code-block:: bash
+   cd ${PROJECT_LOG}
+   listf=$(find ${dirpdf} -type f -name "*.pdf.txt")
+   hunspell_out=${PROJECT_LOG}/hunspell_out
+   hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq
+   rm -f ${hunspell_out}  ${hunspell_out_uniq}
+   for onefile in ${listf}
+   do
+      LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out}
+   done
+   sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq}
+:file:`${hunspell_out_uniq}` contains :
+- typo to be fixed in PDF files
+- false positif to be added in a :file:`docs/manual/for_typo/`
+- false positif to be ignored because bad convertion from PDF to text
+  (ie ligature), too hard to add (encoding issue for Greek word, etc.)
+To find one of the wrong spelling in converted PDF files:
+.. code-block:: bash
+   w=infrastucture # take a real one from ${hunspell_out_uniq}
+   find ${dirpdf} -type f -name "*.pdf.txt" -exec grep -Hi ${w} {} \;

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 11

Legend:

trunk/docs/manual/for_typo/institution.txt

trunk/docs/manual/for_typo/nemo.txt

trunk/docs/manual/source/developers/guides/typo.rst

Download in other formats: